On the smoothing of multinomial estimates using Liouville mixture models and applications

Bouguila, Nizar

doi:10.1007/s10044-011-0236-8

On the smoothing of multinomial estimates using Liouville mixture models and applications

Theoretical Advances
Published: 06 September 2011

Volume 16, pages 349–363, (2013)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Nizar Bouguila¹

238 Accesses
8 Citations
Explore all metrics

Abstract

There has been major progress in recent years in statistical model-based pattern recognition, data mining and knowledge discovery. In particular, generative models are widely used and are very reliable in terms of overall performance. Success of these models hinges on their ability to construct a representation which captures the underlying statistical distribution of data. In this article, we focus on count data modeling. Indeed, this kind of data is naturally generated in many contexts and in different application domains. Usually, models based on the multinomial assumption are used in this case that may have several shortcomings, especially in the case of high-dimensional sparse data. We propose then a principled approach to smooth multinomials using a mixture of Beta-Liouville distributions which is learned to reflect and model prior beliefs about multinomial parameters, via both theoretical interpretations and experimental validations, we argue that the proposed smoothing model is general and flexible enough to allow accurate representation of count data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Systematic Review of Hidden Markov Models and Their Applications

Article 12 May 2020

Bhavya Mor, Sunita Garhwal & Ajay Kumar

A simple algorithm for computing the probabilities of count models based on pure birth processes

Article 10 April 2024

Mongkol Hunkrajok & Wanrudee Skulpakdee

Confidence distributions and hypothesis testing

Article Open access 29 March 2024

Eugenio Melilli & Piero Veronese

Notes

The problem of data sparseness is also known as the zero-frequency problem [10].
It is easy to note from Eq. 1 that the presence of zero counts creates serious numerical problems.
Geometric interpretation of ∑ ^V_v=1 α_v has been proposed in [29].
In particular the authors in [41] provide interesting discussions about the difference between Bayesian and empirical Bayesian approaches.
Chang C-C, Lin C-J LIBSVM: a library for support vector machines. Available at http://www.csie.ntu.edu.tw/∼cjlin/libsv.

References

Brodley CE, Smyth P (1997) Applying classification algorithms in practice. Stat Comput 7(1):45–56
Article Google Scholar
Bouguila N, Ziou D, Vaillancourt J (2003) Novel Mixture based on the Dirichlet distribution: application to data and image classification. In: Perner P, Rosenfeld A (eds) Machine learning and data mining in pattern recognition (MLDM). LNAI, vol 2734. Springer, Berlin, pp 172–181
Vijaya PA, Murty MN, Subramanian DK (2006) Efficient median based clustering and classification techniques for protein sequences. Pattern Anal Appl 9(2-3):243–255
Article MathSciNet Google Scholar
Dagan I, Lee L, Perrira FCN (1999) Similarity-based models of word cooccurrence probabilities. Mach Learn 34(1–3):43–69
Article MATH Google Scholar
Scott S, Matwin S (1999) Feature engineering for text classification. In: Proceedings of the international conference on machine learning (ICML), pp 379–388
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, 8th European conference on computer vision (ECCV)
Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43(1):29–44
Article MATH Google Scholar
Bouguila N, ElGuebaly W (2009) Discrete data clustering using finite mixture models. Pattern Recognit 42(1):33–42
Article MATH Google Scholar
Cheng BYM, Carbonell JG, Klein-Seetharaman J (2005) Protein classification based on text document classification techniques. Prot Struct Funct Bioinform 58:955–970
Article Google Scholar
Witten IH, Bell TC (1991) The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans Inform Theory 37(4):1085–1094
Article Google Scholar
Fienberg SE, Holland PW (1973) Simultaneous estimation of multinomial cell probabilities. J Am Stat Assoc 68(343):683–691
Article MathSciNet MATH Google Scholar
Hall P, Titterington DM (1987) On smoothing sparse multinomial data. Aust J Stat 29(1):19–37
Article MathSciNet MATH Google Scholar
Simonoff JS (1995) Smoothing categorical data. J Stat Plann Infer 47:41–69
Article MathSciNet MATH Google Scholar
Bouguila N, Ziou D (2007) Unsupervised learning of a finite discrete mixture: applications to texture modeling and image databases summarization. J Vis Commun Image Represent 18(4):295–309
Article Google Scholar
Bouguila N, Ziou D (2004) A powerful finite mixture model based on the generalized Dirichlet distribution: unsupervised learning and applications. In Proceedings of the 17th international conference on pattern recognition (ICPR), pp 280–283
Bouguila N, Ziou D (2004) Dirichlet-based probability model applied to human skin detection. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 521–524
Bouguila N, Ziou D, Hammoud RI (2009) On Bayesian analysis of a finite generalized Dirichlet mixture via a metropolis-within-Gibbs sampling. Pattern Anal Appl 12(2):151–166
Article MathSciNet Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Hoare Z (2008) Landscapes of Naive Bayes classifiers. Pattern Anal Appl 11(1):59–72
Article MathSciNet Google Scholar
Andrés-Ferrer J, Juan A (2010) Constrained domain maximum likelihood estimation for Naive Bayes text classification. Pattern Anal Appl 13(2):189–196
Article MathSciNet Google Scholar
Goodman LA (1970) The multivariate analysis of qualitative data: interactions among multiple classifications. J Am Stat Assoc 65(329):226–256
Article Google Scholar
Goodman LA (1971) The analysis of multidimensional contingency tables: stepwise procedures and direct estimation methods for building models for multiple classifications. Technometrics 13(1):33–61
Article MATH Google Scholar
Goodman LA (1964) Interactions in multidimensional contingency tables. Ann Math Stat 35(2):632–646
Article MATH Google Scholar
Gart JJ, Zweifel JR (1967) On the bias of various estimators of the logit and its variance with application to quantal bioassay. Biometrika 54(1/2):181–187
Article MathSciNet MATH Google Scholar
Grizzle JE, Starmer CF, Koch GG (1969) Analysis of categorical data by linear models. Biometrics 25(3):489–504
Article MathSciNet MATH Google Scholar
Bouguila N, Ziou D (2004) Improving content based image retrieval systems using finite multinomial Dirichlet mixture. In: Proceedings of the IEEE workshop on machine learning for signal processing (MLSP), pp 23–32
Bouguila N (2007) Spatial color image databases summarization. In: IEEE International conference on acoustics, speech, and signal processing (ICASSP), vol 1, Honolulu, HI, USA, pp 953–956
Good IJ, Bayesian A (1967) Significance test for multinomial distribution (with Discussion). J R Stat Soc B 29(3):399–431
MATH Google Scholar
Fienberg SE (1972) On the choice of flattening constants for estimating multinomial probabilities. J Multivar Anal 2(1):127–134
Article MathSciNet Google Scholar
Lidstone GJ (1920) Note on the general case of the Bayes–Laplace formula for inductive or a posteriori probabilities. Trans Fac Actuar 8:182–192
Google Scholar
Jeffreys J (1961) Theory of probability. 3rd edn. Clarendon Press, Oxford
MATH Google Scholar
Perks W (1947) Some observations on inverse probability including a new indifference rule (with discussion). J Inst Actuar 73:285–334
MathSciNet Google Scholar
Bouguila N, Ziou D, Vaillancourt J (2004) Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
Article Google Scholar
Lochner RH (1975) A generalized Dirichlet distribution in Bayesian life testing. J R Stat Soc B 37:103–113
MathSciNet MATH Google Scholar
Bouguila N, ElGuebaly W (2008) On discrete data clustering. In: Proceedings of the Pacific–Asia conference on knowledge discovery and data mining (PAKDD). LNCS, vol 5012. Springer, Osaka, pp 503–510
Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, New York
Bouguila N, Ziou D (2005) Using unsupervised learning of a finite Dirichlet mixture model to improve pattern recognition applications. Pattern Recognit Lett 26(12):1916–1925
Article Google Scholar
Bouguila N, Ziou D, Monga E (2006) Practical Bayesian estimation of a finite beta mixture through Gibbs sampling and its applications. Stat Comput 16(2):215–225
Article MathSciNet Google Scholar
Robbins HE (1956) An empirical Bayes approach to statistics. In: Neyman J (ed) Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1, pp 157–163
Robbins HE (1964) The empirical Bayes approach to statistics. Ann Math Stat 35(1):1–20
Article MathSciNet MATH Google Scholar
Deely JJ, Lindley DV (1981) Bayes empirical Bayes. J Am Stat Assoc 76(376):833–841
Article MathSciNet MATH Google Scholar
Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis, 2nd edn. Chapman & Hall/CRC, Boca Raton
McLachlan JG, Krishnan T (1997) The EM Algorithm and Extensions. Wiley
Hu T, Sung SY (2005) Clustering spatial data with a hybrid EM approach. Pattern Anal Appl 8(1–2):139–148
Article MathSciNet Google Scholar
Bouguila N, Ziou D (2007) High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10):1716–1731
Article Google Scholar
Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471
Article MATH Google Scholar
Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1–2):143–175
Article MATH Google Scholar
Lebanon G, Lafferty J (2004) Hyperplane margin classifiers on the multinomial manifold. In: Proceedings of the international conference on machine learning (ICML), pp 66–73
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Zhang D, Chen X, Lee WS (2005) Text classification with kernels on the multinomial manifold. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 266–273
Jebara T, Kondor R, Howard A (2004) Probability product kernels. J Mach Learn Res 5:819–844
MathSciNet MATH Google Scholar
Moreno PJ, Ho PP, Vasconcelos N (2003) A Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Proceedimgs of advances in neural information processing systems (NIPS). MIT Press, Cambridge
Topsoe F (2000) Some inequalities for information divergence and related measures of discrimination. IEEE Trans Inform Theory 46(4):1602–1609
Article MathSciNet Google Scholar
Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Article Google Scholar
Varma M, Zisserman A (2002) Classifying images of materials: achieving viewpoint and illumination independence. In: Proceedings of the European conference on computer vision (ECCV), pp 255–271
Szczypiński PM, Strzelecki M, Materka A, Klepaczko A (2009) MaZda: a software package for image texture analysis. Comput Methods Prog Biomed 94(1):66–76
Article Google Scholar
Zhu SC, Wu Y, Mumford D (1998) Filters, random fields and maximum entropy (FRAME): towards a unified theory for texture modeling. Int J Comput Vis 27(2):107–126
Article Google Scholar
Varma M, Zisserman A (2009) A statistical approach to material classification using image patch exemplars. IEEE Trans Pattern Anal Mach Intell 31(11):2032–2047
Article Google Scholar
Dana KJ, van Ginneken B, Nayar SK, Koenderink JJ (1999) Reflectance and texture of real-world surfaces. ACM Trans Graphics 18(1):1–34
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278
Article Google Scholar
Grzegorzek M (2010) A system for 3D texture-based probabilistic object recognition and its applications. Pattern Anal Appl 13(3):333–348
Article MathSciNet Google Scholar
Schiele B, Pentland A (1999) Probabilistic object recognition and localization. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 177–182
Amsaleg L, Gros P (2001) Content-based retrieval using local descriptors: problems and issues from a database perspective. Pattern Anal Appl 4(2–3):108–124
Article MathSciNet MATH Google Scholar
Caputo B, Wallraven C, Nilsback M-E (2004) Object categorization via local kernels. In: Proceedings of the 17th international conference on pattern recognition (ICPR), pp 132–135
Lyu S (2005) Mercer kernels for object recognition with local features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 223–229
Deselaers T, Keysers D, Ney H (2005) Discriminative training for object recognition using image patches. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 157–162
Loupias E, Sebe N, Bres S, Jolion J (2000) Wavelet-based salient points for image retrieval. In: Proceedings of the IEEE international conference on image processing (ICIP), pp 518–521
Linde Y, Buzo A, Gray RM (1980) An algorithm for vector quantization design. IEEE Trans Commun 28:84–95
Article Google Scholar
Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-20). Technical Report CUCS-005-96, Columbia University
Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-100). Technical Report CUCS-006-96, Columbia University
Weber M, Welling M, Perona P (2000) Unsupervised learning of object models and recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 18–32
Bouguila N, Ziou D (2010) A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Trans Neural Netw 21(1):107–122
Article Google Scholar
Bouguila N (2009) A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity. IEEE Trans Knowl Data Eng 21(12):1649–1664
Article Google Scholar

Download references

Acknowledgments

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC, H3G 1T7, Canada
Nizar Bouguila

Authors

Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nizar Bouguila.

Appendices

Appendix 1: Proof of Eq. 23

We start by computing the posterior distribution:

$$\begin{aligned} p({\varvec {\pi}}|{\mathbf {X}},\Uptheta)& =\frac{p({\mathbf {X}},{\varvec {\pi}} |\Uptheta)}{p({\mathbf {X}}|\Uptheta)}\\ &=\frac{1}{\sum_{j=1}^M p_j \frac{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv})\Upgamma(\alpha_j+\beta_j)}{\Upgamma(\alpha_j)\Upgamma(\beta_j)} \frac{\Upgamma(\alpha_j^{\prime})\Upgamma(\beta_j^{\prime})}{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime})\Upgamma(\alpha_j^{\prime}+\beta_j^{\prime})}}\\ &\quad \times \sum_{j=1}^M p_j \frac{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv})\Upgamma(\alpha_j+\beta_j)}{\Upgamma(\alpha_j)\Upgamma(\beta_j)}\\ &\quad \times \prod_{v=1}^{V-1} \frac{\pi_v^{\alpha_{jv}+X_v-1}}{\Upgamma(\alpha_{jv})} \left(\sum_{v=1}^{V-1} \pi_{v}\right)^{\alpha_j-\sum_{v=1}^{V-1} \alpha_{jv}}\\ &\quad \times \left(1-\sum_{v=1}^{V-1} \pi_v\right)^{\beta_j+X_V-1} \end{aligned}$$

(44)

To find the estimate of a certain parameter the $\pi_l, l=1,\ldots,V$ when a Beta-Liouville mixture is taken as a prior, we have to compute the expectation π_v according to the previous posterior distribution:

$$ \begin{aligned} \hat{\pi}_l&=\int\limits_{\pi_l}\pi_l p({\varvec {\pi}}|{\mathbf {X}},\Uptheta){\rm{d}}\pi_l\\ &=\frac{1}{\sum_{j=1}^M p_j \frac{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv})\Upgamma(\alpha_j+\beta_j)}{\Upgamma(\alpha_j) \Upgamma(\beta_j)\prod_{v=1}^{V-1}\Upgamma(\alpha_{jv})} \frac{\Upgamma(\alpha_j^{\prime})\Upgamma(\beta_j^{\prime})\prod_{v=1}^{V-1} \Upgamma(\alpha_{jv}^{\prime})}{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime})\Upgamma(\alpha_j^{\prime}+\beta_j^{\prime})}}\\ &\quad\times \sum_{j=1}^M p_j \frac{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv})\Upgamma(\alpha_j+\beta_j)} {\Upgamma(\alpha_j)\Upgamma(\beta_j)\prod_{v=1}^{V-1}\Upgamma(\alpha_{jv})}\\ & \quad\times \int\limits_{\pi_l}\left[\prod_{v=1}^{V-1}\pi_v^{\alpha_{jv}+\delta(v=l)+X_v-1}\right.\\ &\left.\left(\sum_{v=1}^{V-1} \pi_{v}\right)^{\alpha_j-\sum_{v=1}^{V-1} \alpha_{jv}}\left(1-\sum_{v=1}^{V-1} \pi_{v}\right)^{\beta_j+X_V-1}{\rm{d}}\pi_l\right]\\ =&\frac{1}{\sum_{j=1}^M p_j \frac{\Upgamma\left(\sum_{v=1}^{V-1} \alpha_{jv}\right)\Upgamma(\alpha_j+\beta_j)}{\Upgamma(\alpha_j) \Upgamma(\beta_j)\prod_{v=1}^{V-1}\Upgamma(\alpha_{jv})} \frac{\Upgamma(\alpha_j^{\prime})\Upgamma(\beta_j^{\prime})\prod_{v=1}^{V-1} \Upgamma(\alpha_{jv}^{\prime})}{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime})\Upgamma(\alpha_j^{\prime}+\beta_j^{\prime})}}\\ &\quad\times \sum_{j=1}^M p_j \frac{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv})\Upgamma(\alpha_j+\beta_j)}{\Upgamma(\alpha_j) \Upgamma(\beta_j)\prod_{v=1}^{V-1}\Upgamma(\alpha_{jv})}\\ &\quad\times \frac{\Upgamma(\alpha_j^{\prime}+1)\Upgamma(\beta_j^{\prime}) \Upgamma(\alpha_{jl}^{\prime}+1)\prod_{v=1,v \neq l}^{V-1}\Upgamma(\alpha_{jv}^{\prime})}{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}+1)\Upgamma(\alpha_j^{\prime}+\beta_j^{\prime}+1)} \end{aligned} $$

where δ(v = l) = 1 if v = 1 and 0, otherwise. Since $\Upgamma (x+1)=x \Upgamma(x),$ we obtain

$$ \hat{\pi}_l=\sum_{j=1}^M p(j|{\mathbf {X}}) \frac{\alpha_j^{\prime}}{\alpha_j^{\prime}+\beta_j^{\prime}}\frac{\alpha_{jl}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}} $$

where

$$ p(j|{\mathbf {X}})=\frac{p_j \frac{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv})\Upgamma(\alpha_j+\beta_j)}{\Upgamma(\alpha_j) \Upgamma(\beta_j)\prod_{v=1}^{V-1}\Upgamma(\alpha_{jv})} \frac{\Upgamma(\alpha_{j}^{\prime}) \Upgamma(\beta_{j}^{\prime})\prod_{v=1}^{V-1} \Upgamma(\alpha_{jv}^{\prime})}{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime})\Upgamma(\alpha_j^{\prime}+\beta_j^{\prime})}} {\sum_{j=1}^M p_j \frac{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv})\Upgamma(\alpha_j+\beta_j)}{\Upgamma(\alpha_j) \Upgamma(\beta_j)\prod_{v=1}^{V-1}\Upgamma(\alpha_{jv})} \frac{\Upgamma(\alpha_{j}^{\prime})\Upgamma(\beta_{j}^{\prime}) \prod_{v=1}^{V-1}\Upgamma(\alpha_{jv}^{\prime})}{\Upgamma(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime})\Upgamma(\alpha_{j}^{\prime}+\beta_{j}^{\prime})}} $$

Appendix 2: Proof of Eqs. 29, 30 and 31

We have

$$ \begin{aligned} \log &f({\mathcal{X}}|\Uptheta)\\ &\quad=\sum_{n=1}^N\left[{X_{nv}}\left(\sum_{v=1}^{V-1} \log\left(\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+ \beta_{j}^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right)\right.\\ &\left.\qquad+{X_{nV}} \log \left(1-\sum_{v=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right] \end{aligned} $$

Thus,

$$ \begin{aligned} &\frac{\partial \log f({\mathcal{X}}|\Uptheta)}{\partial \alpha_j}\\ &=\sum_{n=1}^{N}\left[{X_{nv}}\frac{\partial}{\partial \alpha_{j}}\left(\sum_{v=1}^{V-1} \hbox{log}\left(\sum_{j=1}^{M} p(j|{{\mathbf{X}}}_{n}) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right)\right.\\ &\quad+\left.{X_{nV}} \frac{\partial}{\partial \alpha_j}\hbox{log} \left(1-\sum_{v=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right]\\ &=\sum_{n=1}^N\left[{X_{nv}}\left(\sum_{v=1}^{V-1}\frac{\frac{\partial} {\partial \alpha_j} \left(\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right)\right.\\ &\quad +\left.{X_{nV}} \frac{\frac{\partial}{\partial \alpha_j}\left(1-\sum_{v=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{1-\sum_{l=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right]\\ &=\sum_{n=1}^N\left[{X_{nv}}\left(\sum_{v=1}^{V-1}F_{njv} \frac{\partial}{\partial \alpha_j}\hbox{log}( p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}})\right)\right.\\ &\quad\left.-{X_{nV}} \frac{\frac{\partial}{\partial \alpha_j}\left(\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{1-\sum_{l=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right]\\ &=\sum_{n=1}^N\left[{X_{nv}}\left(\sum_{v=1}^{V-1}F_{njv} \frac{\partial}{\partial \alpha_j}\hbox{log}(\frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+ \beta_{j}^{\prime}})\right)\right.\\ &\quad\left.-{X_{nV}} F_{njV}\frac{\partial}{\partial \alpha_j}\hbox{log}(\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}})\right]\\ &=\sum_{n=1}^N\left[{X_{nv}}\left(\sum_{v=1}^{V-1}F_{njv} \left(\frac{1}{\alpha_{j}^{\prime}}-\frac{1} {\alpha_{j}^{\prime}+\beta_{j}^{\prime}}\right)\right)\right.\\ &\quad\left.-{X_{nV}} F_{njV}\frac{\sum_{v=1}^{V-1} p(j|{{\mathbf{X}}}_n) \frac{\beta_{j}^{\prime}}{(\alpha_{j}^{\prime}+\beta_{j}^{\prime})^2} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}{\sum_{v=1}^{V-1} p(j|{{\mathbf{X}}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right]\\ \end{aligned} $$

where

$$ F_{njv}=\frac{p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{\alpha_j^{\prime}+\beta_j^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}{\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{\alpha_j^{\prime}+\beta_j^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}} $$

$$ F_{njV}=\frac{\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{\alpha_j^{\prime}+\beta_j^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}{1-\sum_{v=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{\alpha_j^{\prime}+\beta_j^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}} $$

Using the same development, we can easily show that

$$ \begin{aligned} \frac{\partial \log f({\mathcal{X}}|\Uptheta)}{\partial \beta_j}&=\sum_{n=1}^N\left[\vphantom{{X_{nV}} F_{njV}\frac{\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{(\alpha_j^{\prime}+\beta_j^{\prime})^2}\frac{\alpha_{jv}^{\prime}} {\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}{\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{\alpha_j^{\prime}+\beta_j^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}}{X_{nv}}\left(\sum_{v=1}^{V-1}F_{njv}\left(-\frac{1}{\alpha_j^{\prime}+\beta_j^{\prime}}\right)\right)\right.\\ &\left.+{X_{nV}} F_{njV}\frac{\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{(\alpha_j^{\prime}+\beta_j^{\prime})^2}\frac{\alpha_{jv}^{\prime}} {\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}{\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_j^{\prime}}{\alpha_j^{\prime}+\beta_j^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right] \\ \end{aligned} $$

and that

$$ \begin{aligned} &\frac{\partial \log f({\mathcal{X}}|\Uptheta)}{\partial \alpha_{jv}}\\ &=\sum_{n=1}^N\left[{X_{nv}}\left(\frac{\frac{\partial}{\partial \alpha_{jv}} \left(\sum_{j=1}^M p(j|\mathbf{X}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{\sum_{j=1}^{M} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right)\right.\\ &\quad\left.+{X_{nV}}\frac{\frac{\partial}{\partial \alpha_{jv}}\left(1-\sum_{v=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{1-\sum_{l=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right]\\ &=\sum_{n=1}^N\left[\vphantom{-{X_{nV}} \frac{\frac{\partial}{\partial \alpha_{jv}}\left(\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{1-\sum_{l=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}}{X_{nv}}\left(F_{njv}\frac{\partial}{\partial {\alpha_{jv}}}\hbox{log}\left(p(j|{{\mathbf{X}}}_{n}) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+ \beta_{j}^{\prime}}\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right)\right. \\ &\quad\left.-{X_{nV}} \frac{\frac{\partial}{\partial \alpha_{jv}}\left(\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{1-\sum_{l=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right]\\ &=\sum_{n=1}^N\left[{X_{nv}}\left(F_{njv}\frac{\partial}{\partial \alpha_{jv}}\hbox{log}\left(\frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right)\right. \\ &\quad\left.-{X_{nV}} F_{njV}\frac{\partial}{\partial \alpha_{jv}} \hbox{log}\left(\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right] \\ &=\sum_{n=1}^N\left[\vphantom{-{X_{nV}} \frac{\frac{\partial}{\partial \alpha_{jv}}\left(\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)}{1-\sum_{l=1}^{V-1}\sum_{j=1}^M p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}}{X_{nv}}\left(F_{njv}\left(\frac{1}{\alpha_{jv}^{\prime}}-\frac{1}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}\right)\right)\right.\\ &\quad\left.-{X_{nV}} F_{njV}\frac{ p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime})-\alpha_{jv}^{\prime}}{(\sum_{v=1}^{V-1} \alpha_{jv}^{\prime})^2}}{\sum_{v=1}^{V-1} p(j|{\mathbf {X}}_n) \frac{\alpha_{j}^{\prime}}{\alpha_{j}^{\prime}+\beta_{j}^{\prime}} \frac{\alpha_{jv}^{\prime}}{\sum_{v=1}^{V-1} \alpha_{jv}^{\prime}}}\right]\\ \end{aligned} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouguila, N. On the smoothing of multinomial estimates using Liouville mixture models and applications. Pattern Anal Applic 16, 349–363 (2013). https://doi.org/10.1007/s10044-011-0236-8

Download citation

Received: 18 September 2010
Accepted: 20 August 2011
Published: 06 September 2011
Issue Date: August 2013
DOI: https://doi.org/10.1007/s10044-011-0236-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

On the smoothing of multinomial estimates using Liouville mixture models and applications

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

A simple algorithm for computing the probabilities of count models based on pure birth processes

Confidence distributions and hypothesis testing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Eq. 23

Appendix 2: Proof of Eqs. 29, 30 and 31

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the smoothing of multinomial estimates using Liouville mixture models and applications

Abstract

Access this article

Similar content being viewed by others

A Systematic Review of Hidden Markov Models and Their Applications

A simple algorithm for computing the probabilities of count models based on pure birth processes

Confidence distributions and hypothesis testing

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Proof of Eq. 23

Appendix 2: Proof of Eqs. 29, 30 and 31

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation