Skip to main content

Advertisement

Log in

Maximum likelihood estimation for incomplete multinomial data via the weaver algorithm

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In a multinomial model, the sample space is partitioned into a disjoint union of cells. The partition is usually immutable during sampling of the cell counts. In this paper, we extend the multinomial model to the incomplete multinomial model by relaxing the constant partition assumption to allow the cells to be variable and the counts collected from non-disjoint cells to be modeled in an integrated manner for inference on the common underlying probability. The incomplete multinomial likelihood is parameterized by the complete-cell probabilities from the most refined partition. Its sufficient statistics include the variable-cell formation observed as an indicator matrix and all cell counts. With externally imposed structures on the cell formation process, it reduces to special models including the Bradley–Terry model, the Plackett–Luce model, etc. Since the conventional method, which solves for the zeros of the score functions, is unfruitful, we develop a new approach to establishing a simpler set of estimating equations to obtain the maximum likelihood estimate (MLE), which seeks the simultaneous maximization of all multiplicative components of the likelihood by fitting each component into an inequality. As a consequence, our estimation amounts to solving a system of the equality attainment conditions to the inequalities. The resultant MLE equations are simple and immediately invite a fixed-point iteration algorithm for solution, which is referred to as the weaver algorithm. The weaver algorithm is short and amenable to parallel implementation. We also derive the asymptotic covariance of the MLE, verify main results with simulations, and compare the weaver algorithm with an MM/EM algorithm based on fitting a Plackett–Luce model to a benchmark data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2003)

    MATH  Google Scholar 

  • Bradley, R.A., Terry, M.E.: Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika 39(3/4), 324–345 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  • Caron, F., Doucet, A.: Efficient Bayesian inference for generalized Bradley–Terry models. J. Comput. Graph. Stat. 21(1), 174–196 (2012)

    Article  MathSciNet  Google Scholar 

  • Chen, T., Fienberg, S.E.: The analysis of contingency tables with incompletely classified data. Biometrics 32(1), 133–144 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  • Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  • Cox, D.A., Little, J., O’Shea, D.: Ideals, Varieties, and Algorithm: An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3rd edn. Springer, New York (2007)

    Book  MATH  Google Scholar 

  • David, H.A.: The Method of Paired Comparisons, 2nd edn. Oxford University Press, Oxford (1988)

    MATH  Google Scholar 

  • Davidson, R., Farquhar, P.: A bibliography on the method of paired comparisons. Biometrics 32, 241–252 (1976)

    MathSciNet  MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Diaconis, P.: In: Gupta, S.S. (ed.) Group Representations in Probability and Statistics, Lecture Notes-Monograph Series, vol. 11. Institute of Mathematical Statistics Hayward, CA. https://projecteuclid.org/euclid.lnms/1215467407 (1988)

  • Dickey, J.M., Jiang, J.M., Kadane, J.B.: Bayesian methods for censored categorical data. J. Am. Stat. Assoc. 82(399), 773–781 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622. ACM (2001)

  • Ford, L.R.J.: Solution of a ranking problem from binary comparisons. Am. Math. Mon. 64(8), 28–33 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, L.: Successive sampling in large finite populations. Ann. Stat. 11(2), 702–706 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Gormley, I.C., Murphy, T.B.: Exploring voting blocs within the irish electorate: a mixture modeling approach. J. Am. Stat. Assoc. 103(483), 1014–1027 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Guiver, J., Snelson, E.: Bayesian inference for Plackett-Luce ranking models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 377–384. ACM, Pittsburgh (2009)

  • Haberman, S.J.: Product models for frequency tables involving indirect observation. Ann. Stat. 5(6), 1124–1147 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  • Hankin, R.K.S.: A generalization of the Dirichlet distribution. J. Stat. Softw. 33(11), 1–18 (2010)

    Article  Google Scholar 

  • Hartley, H.O., Hocking, R.R.: The analysis of incomplete data. Biometrics 27(4), 783–823 (1971)

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R.: Classification by pairwise coupling. Ann. Stat. 26(2), 451–471 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Heiser, W.J.: Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski, W.J. (ed.) Recent Advances in Descriptive Multivariate Analysis, pp. 157–189. Clarendon Press, Oxford (1995)

    Google Scholar 

  • Huang, T.K., Weng, R.C., Lin, C.J.: Generalized Bradley–Terry models and multi-class probability estimates. J. Mach. Learn. Res. 7, 85–115 (2006)

    MathSciNet  MATH  Google Scholar 

  • Hunter, D.R.: MM algorithms for generalized Bradley–Terry models. Ann. Stat. 32(1), 384–406 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Hunter, D.R., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1), 30–37 (2004)

    Article  MathSciNet  Google Scholar 

  • Jech, T.: The ranking of incomplete tournaments: a mathematician’s guide to popular sports. Am. Math. Mon. 90(4), 246–266 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Kernighan, B.W., Ritchie, D.M.: In: Ritchie, D.M. (ed.) The C Programming Language, 2nd edn. Prentice Hall Professional Technical Reference, Upper Saddle River (1988)

  • Lagarias, J., Reeds, J., Wright, M., Wright, P.: Convergence properties of the Nelder–Mead simplex method in low dimensions. SIAM J. Optim. 9(1), 112–147 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Laird, N.: Nonparametric maximum likelihood estimation of a mixing distribution. J. Am. Stat. Assoc. 73(364), 805–811 (1978)

    Article  MATH  Google Scholar 

  • Lange, K.: Optimization, 2nd edn. Springer, New York (2013)

    Book  MATH  Google Scholar 

  • Lange, K., Zhou, H.: MM algorithms for geometric and signomial programming. Math. Program. 143(1–2), 339–356 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 9(1), 1–59 (2000)

    MathSciNet  Google Scholar 

  • Loève, M.: Probability Theory I, 4th edn. Springer, New York (1977)

    MATH  Google Scholar 

  • Loève, M.: Probability Theory II, 4th edn. Springer, New York (1978)

    Book  MATH  Google Scholar 

  • Luce, R.D.: Individual Choice Behavior: A Theoretical Analysis. Wiley, New York (1959)

    MATH  Google Scholar 

  • Luce, R.D.: The choice axiom after twenty years. J. Math. Psychol. 15, 215–223 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  • Marden, J.I.: Analyzing and Modeling Rank Data. Chapman & Hall/CRC, Boca Raton (1996)

    MATH  Google Scholar 

  • MathWorks: Matlab documentation. URL https://www.mathworks.com/help/matlab/ref/profile.html (2017)

  • McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)

    Book  MATH  Google Scholar 

  • Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  • Ng, K.W., Tian, G.L., Tang, M.L.: Dirichlet and Related Distributions: Theory, Methods and Applications. Wiley, New York (2011)

    Book  MATH  Google Scholar 

  • NVIDIA: CUDA Toolkit Documentation v8.0. URL http://docs.nvidia.com/cuda/index.html (2017)

  • Pistone, G., Riccomagno, E., Wynn, H.P.: Algebraic Statistics: Computational Commutative Algebra in Statistics. Chapman & Hall/CRC, Boca Raton (2000)

    Book  MATH  Google Scholar 

  • Plackett, R.L.: The analysis of permutations. Appl. Stat. 24, 193–202 (1975)

    Article  MathSciNet  Google Scholar 

  • Sattath, S., Tversky, A.: Unite and conquer: a multiplicative inequality for choice probabilities. Econometrica 44(1), 79–89 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  • Suppes, P., Krantz, D.H., Luce, R.D., Tversky, A.: Foundations of Measurement: Geometrical, Threshold, and Probabilistic Representations. Academic Press, New York (1971)

    MATH  Google Scholar 

  • Tanner, M.A.: Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer, New York (1996)

    Book  MATH  Google Scholar 

  • Thurstone, L.L.: Psychophysical analysis. Am. J. Psychol. 38(3), 368–389 (1927)

    Article  Google Scholar 

  • Turnbull, B.W.: The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Stat. Soc. Ser. B (Methodol.) 38(3), 290–295 (1976)

    MathSciNet  MATH  Google Scholar 

  • Tversky, A.: Elimination by aspects: a theory of choice. Psychol. Rev. 79, 281–299 (1972)

    Article  Google Scholar 

  • Wu, C.F.J.: On the convergence properties of the EM algorithm. Ann. Stat. 11(1), 95–103 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Yan, T., Yang, Y., Xu, J.: Sparse paired comparisons in the Bradley–Terry model. Statistica Sinica 22(3), 1305–1318 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Zermelo, E.: Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 29(1), 436–460 (1929)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the two referees, Associate Editor, and Editor for their insightful comments that have significantly improved the article. Yin’s research was supported in part by a grant (17326316) from the Research Grants Council of Hong Kong.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fanghu Dong.

Appendices

Appendix A: Proof of Lemma 1

Proof

(Work with \(x_{i}/a_{i}\) and connect to the weighted AM–GM inequality, with its equality condition). Rewrite the target inequality as

$$\begin{aligned} \prod \limits _{i=1}^{n}{a_{i}^{{a_{i}}}}{\prod \limits _{i=1}^{n}{\left( {\frac{{x_{i}}}{{a_{i}}}}\right) }^{{a_{i}}}}\leqslant \frac{{\prod \limits _{i=1}^{n}{a_{i}^{{a_{i}}}}}}{{{\left( {\sum \limits _{i=1}^{n}{a_{i}}}\right) }^{\sum \limits _{i=1}^{n}{a_{i}}}}}a_{i}^{\sum \limits _{i=1}^{n}{a_{i}}}{\left( {\sum \limits _{i=1}^{n}{\frac{{x_{i}}}{{a_{i}}}}}\right) ^{\sum \limits _{i=1}^{n}{a_{i}}}}, \end{aligned}$$

By substituting \(y_{i}\) for \(x_{i}/a_{i}\) and taking the \(\left( {\sum \limits _{i=1}^{n}{a_{i}}}\right) \)-th root on both sides, we have

$$\begin{aligned} \prod \limits _{i=1}^{n}{y_{i}^{\frac{{a_{i}}}{{\sum \limits _{i=1}^{n}{a_{i}}}}}}\leqslant \sum \limits _{i=1}^{n}{\frac{{a_{i}}}{{\sum \limits _{i=1}^{n}{a_{i}}}}{y_{i}}}. \end{aligned}$$

After a further substitution of \(w_{i}=a_{i}/\sum _{i=1}^{n}a_{i}\), we arrive at

$$\begin{aligned} \prod \limits _{i=1}^{n}{y_{i}^{{w_{i}}}}\leqslant \sum \limits _{l=1}^{n}{{w_{i}}{y_{i}}}, \end{aligned}$$

which is the weighted AM-GM inequality. It is crucial that we now check and confirm that all equalities can hold jointly if and only if \(x_{i}/a_{i}=\tau \) for all i, given the existence of such a uniform constant \(\tau \) which must be positive. \(\square \)

Appendix B: Examples and Corollaries of Lemma 1

Example 5

\(\left( x_{1}+x_{2}\right) ^{5}\geqslant \frac{5^{5}}{3^{3}2^{2}}x_{1}^{3}x_{2}^{2}\). This inequality holds because

$$\begin{aligned} x_{1}^{3}x_{2}^{2}= & {} \frac{{x_{1}}}{3}\frac{{x_{1}}}{3}\frac{{x_{1}}}{3}\frac{{x_{2}}}{2}\frac{{x_{2}}}{2}{3^{3}}{2^{2}} \\\leqslant & {} {3^{3}}{2^{2}}{\left( {\frac{{3\frac{{x_{1}}}{3}+2\frac{{x_{2}}}{2}}}{{3+2}}}\right) ^{3+2}} \\= & {} {3^{3}}{2^{2}}{\left( {\frac{{{x_{1}}+{x_{2}}}}{5}}\right) ^{5}}, \end{aligned}$$

where the equality is attained if and only if \((x_{1},x_{2})\) is colinear with (3, 2).

Example 6

\(\left( x_{1}+x_{2}\right) ^{7}x_{3}^{3}x_{4}^{5} \leqslant \frac{{3^{3}}{5^{5}}{7^{7}}}{{15}^{15}} \left( {x_{1}}+{x_{2}}+{x_{3}}+{x_{4}}\right) ^{15}\). This inequality holds because

$$\begin{aligned} \left( x_{1}+x_{2}\right) ^{7}x_{3}^{3}x_{4}^{5} \leqslant {{7^{7}}3^{3}}{5^{5}}{\left( {\frac{7\frac{{{x_{1}}+{x_{2}}}}{7}+{3\frac{{x_{3}}}{3}+5\frac{{x_{4}}}{5}}}{{7+3+5}}}\right) ^{7+3+5}}, \end{aligned}$$

where the equality is attained if and only if \((x_{1}+x_{2},\,x_{3},\,x_{4})\) is colinear with (7, 3, 5). More importantly, together with the inequality in the previous example, the two equalities are jointly attained if and only if \((x_{1},\,x_{2},\,x_{3},\,x_{4})\) is colinear with (21, 14, 15, 25).

Corollary 1

If we require \(\sum _{i=1}^{n}{x_{i}}=\sum _{i=1}^{n}{a_{i}}=1\) in Lemma 1, then

$$\begin{aligned}&\prod \limits _{i=1}^{n}{x_{i}^{{a_{i}}}} \leqslant \prod \limits _{i=1}^{n}{a_{i}^{{a_{i}}}},\nonumber \\&\sum \limits _{i=1}^{n}{{a_{i}}\ln {x_{i}}} \leqslant \sum \limits _{i=1}^{n}{{a_{i}}\ln {a_{i}}}, \end{aligned}$$
(18)

and the equalities are attained if and only if \(x_{i}=a_{i}\) for \(i=1,\ldots ,n\).

Corollary 2

Let \(\varvec{x}\in (0,+\infty )^{n}\) be a vector of n positive reals. Let \(\varvec{\delta }\in \{0,1\}^{n}\) be a vector of n bits. Let \(\varvec{\beta }\in [0,+\infty )^{n}\) be a nonzero vector of n nonnegative reals such that \(\beta _{j}=0\) if \(\delta _{j}=0\). Let \(b=\sum _{i=1}^{n}{\beta _{i}}>0\). Define \(0^{0}=1\). Then

$$\begin{aligned} \left( \varvec{\delta }^{\intercal }\varvec{x}\right) ^{b} \geqslant \frac{b^{b}}{\prod \limits _{i=1}^{n}\beta _{i}^{{\beta _{i}}}}\prod \limits _{i=1}^{n}x_{i}^{{\beta _{i}}}, \end{aligned}$$

where the equality is attained if and only if there exists a positive k such that \(x_{i}/\beta _{i}=k\) for each of the i’s having \(\delta _{i}=1\).

Example 7

Let \(n=5\), \(\varvec{\delta }=(1,0,1,0,1)^{\intercal }\), \(\varvec{\beta }=(3,0,4,0,6)^{\intercal }\), \(b=3+0+4+0+6=13\). Then \(\forall \varvec{x}\in (0,+\infty )^{n}\), we have

$$\begin{aligned}&(1x_{1}+0x_{2}+1x_{3}+0x_{4}+1x_{5})^{13}\\&\quad \geqslant \frac{13^{13}}{3^{3}0^{0}4^{4}0^{0}6^{6}}x_{1}^{3}x_{2}^{0}x_{3}^{4}x_{4}^{0}x_{5}^{6}, \end{aligned}$$

which attains the equality if and only if \(x_{1}:x_{3}:x_{5}=3:4:6\).

Corollary 3

If we rescale each \(x_{i}\) by an independent positive constant \(c_{i}\), then we have the a seemingly more general but rather equivalent formulation of Lemma 1,

$$\begin{aligned} \prod \limits _{i=1}^{n}{x_{i}^{{a_{i}}}}\leqslant \frac{{\prod \limits _{i=1}^{n}{a_{i}^{{a_{i}}}}}}{{\prod \limits _{i=1}^{n}{c_{i}^{{a_{i}}}}{{\left( {\sum \limits _{i=1}^{n}{a_{i}}}\right) }^{\sum \limits _{i=1}^{n}{a_{i}}}}}}{\left( {\sum \limits _{i=1}^{n}{{c_{i}}{x_{i}}}}\right) ^{\sum \limits _{i=1}^{n}{a_{i}}}}, \end{aligned}$$

which attains the equality if and only if there exists some positive constant k such that \({{c_{i}}{x_{i}}}/{a_{i}}=k\) for all i.

Example 8

Let \(n=3\), \(a=(1,2,3)\), \(c=(4,5,6)\), then we have

$$\begin{aligned}&\left( {4{x_{1}}}\right) {\left( {5{x_{2}}}\right) ^{2}}{\left( {6{x_{3}}}\right) ^{3}}\\&\quad \leqslant {\left( {\frac{{4{x_{1}}+\frac{{5{x_{2}}}}{2}+\frac{{5{x_{2}}}}{2}+\frac{{6{x_{3}}}}{3}+\frac{{6{x_{3}}}}{3}+\frac{{6{x_{3}}}}{3}}}{6}}\right) ^{6}}. \end{aligned}$$

Therefore,

$$\begin{aligned} {x_{1}}x_{2}^{2}x_{3}^{3}\leqslant \frac{1}{{{4^{1}}{5^{2}}{6^{3}}}}\frac{{{1^{1}}{2^{2}}{3^{3}}}}{{6^{6}}}{\left( {4{x_{1}}+5{x_{2}}+6{x_{3}}}\right) ^{6}}, \end{aligned}$$

which attains equality if and only if \(4{x_{1}}={5{x_{2}}}/2={6{x_{3}}}/3\) or \({x_{1}}:{x_{2}}:{x_{3}}=5:8:10\).

Corollary 4

Generalizing Corollary 3 to a linear transform \(\varvec{U}\) on vector \(\varvec{x}\),

$$\begin{aligned} \prod \limits _{i=1}^{n}{{\left( {\varvec{u}_{i}^{\intercal }\varvec{x}}\right) }^{{a_{i}}}}\le \left\{ {\prod \limits _{i=1}^{n}{{\left( {\frac{{a_{i}}}{{\theta _{i}}}}\right) }^{{a_{i}}}}}\right\} {\left( {\frac{{{\varvec{\theta }^{\intercal }}\varvec{U}\varvec{x}}}{{\sum \limits _{i=1}^{n}{a_{i}}}}}\right) ^{\sum \limits _{i=1}^{n}{a_{i}}}}, \end{aligned}$$

which attains the equality if and only if

$$\begin{aligned} \left[ {\begin{array}{ccc} {\frac{{\theta _{1}}}{{a_{1}}}} &{} &{} 0\\ &{} \ddots \\ 0 &{} &{} {\frac{{\theta _{n}}}{{a_{n}}}} \end{array}}\right] \varvec{U}\varvec{x}=k\mathbf {1}_{n}, \end{aligned}$$

where k is a constant and can be solved explicitly under an extra constraint such as an affine constraint on \(\varvec{x}\).

Example 9

Let \(x_{1}=2y_{1}+y_{2}\) and \(x_{2}=y_{1}+2y_{2}\) in the first case of Example 5, we have

$$\begin{aligned} {\left( {2{y_{1}}+{y_{2}}}\right) ^{3}}{\left( {{y_{1}} +2{y_{2}}}\right) ^{2}}\le \frac{{{2^{2}}{3^{8}}}}{{5^{5}}}{\left( {{y_{1}}+{y_{2}}}\right) ^{5}}, \end{aligned}$$

which attains equality if and only if \(y_{1}=4y_{2}\). By requiring the constraint \(y_{1}+y_{2}=1\) on the solution, it follows

$$\begin{aligned} \left[ {\begin{array}{c} {y_{1}}\\ {y_{2}} \end{array}}\right] =\left[ {\begin{array}{c} {0.8}\\ {0.2} \end{array}}\right] , \end{aligned}$$

and the unique maximum of \({\left( {2{y_{1}}+{y_{2}}}\right) ^{3}} {\left( {{y_{1}}+2{y_{2}}}\right) ^{2}}\) attained is \({{2^{2}}{3^{8}}}/{5^{5}}=8.398\).

We recursively apply the inequality to the objective, as this inequality transforms the maximization problem into a set of equality attainment conditions, which becomes a system of simple equations.

Appendix C: Proof of the ascent property and the linear rate of convergence of the weaver algorithm when s is sufficiently large

We instead maximize the log-likelihood with a Lagrange multiplier term to incorporate the equality constraint,

$$\begin{aligned} \ell (\varvec{p})={\varvec{a}^{\intercal }}\ln \varvec{p}+{\varvec{b}^{\intercal }}\ln {\varvec{\varDelta }^{\intercal }}\varvec{p}-s\left( {{\varvec{1}^{\intercal }}{\varvec{p}}- 1}\right) , \end{aligned}$$

where the Lagrange multiplier is the known constant

$$\begin{aligned} s=\varvec{1}^{\intercal }\varvec{a}+\varvec{1}^{\intercal }\varvec{b}, \end{aligned}$$

not adding an extra unknown.

The derivative of \(\ell (\varvec{p})\) with respect to \(p_{i}\) at iteration k is given by

$$\begin{aligned} \frac{{\partial \ell (\varvec{p})}}{{\partial p_{i}^{\left( k\right) }}}=\frac{{a_{i}}}{{p_{i}^{\left( k\right) }}}+\sum \limits _{j=1}^{q}{\frac{{{\varDelta _{ij}}{b_{j}}}}{\sum _{h=1}^{d}\varDelta _{hj}p_{h}^{\left( k\right) }}}-s. \end{aligned}$$

Combining the weaver steps 1 and 2, \(p_{i}^{(k)}\) is updated according to

$$\begin{aligned} p_{i}^{\left( {k+1}\right) }=\frac{{a_{i}}}{{s-\sum \limits _{j=1}^{q}{\frac{{{\varDelta _{ij}}{b_{j}}}}{\sum _{h=1}^{d}\varDelta _{hj}p_{h}^{\left( k\right) }}}}}. \end{aligned}$$

We seek to establish the positivity of the quantity

$$\begin{aligned} \left( {p_{i}^{\left( {k+1}\right) }-p_{i}^{\left( k\right) }}\right) \frac{{\partial \ell (\varvec{p})}}{{\partial p_{i}^{\left( k\right) }}}= & {} \left\{ {\frac{{a_{i}}}{{p_{i}^{\left( k\right) }}}+\sum \limits _{j=1}^{q}{\frac{{{\varDelta _{ij}}{b_{j}}}}{\sum _{h=1}^{d}\varDelta _{hj}p_{h}^{\left( k\right) }}}-s}\right\} \\&\times \left\{ {\frac{{a_{i}}}{{s-\sum \limits _{j=1}^{q}{\frac{{{\varDelta _{ij}}{b_{j}}}}{\sum _{h=1}^{d}\varDelta _{hj}p_{h}^{\left( k\right) }}}}}-p_{i}^{\left( k\right) }}\right\} \\= & {} \frac{{{\left( {{a_{i}}-p_{i}^{\left( k\right) }{v^{\left( k\right) }}}\right) }^{2}}}{{p_{i}^{\left( k\right) }{v^{\left( k\right) }}}}, \end{aligned}$$

where

$$\begin{aligned} {v^{\left( k\right) }}\equiv s-\sum \limits _{j=1}^{q}{\frac{{{\varDelta _{ij}}{b_{j}}}}{\sum _{h=1}^{d}\varDelta _{hj}p_{h}^{\left( k\right) }}}. \end{aligned}$$

It is now clear the condition for the last quantity to be positive is \(v^{\left( k\right) }>0\). Then, under this condition, every step of the iteration increases \(\ell (\varvec{p})\). Since \(\ell (\varvec{p})\) is clearly bounded from above, the iteration converges.

Next, we show the rate of convergence is linear. We denote the ith component of the solution as \(p_{i}^{(*)}\) and use the simpler symbol g to denote the derivative function \( g\left( p_{i}\right) \equiv \frac{\partial \ell (\varvec{p})}{\partial p_{i}}, \) hence \(g\left( p_{i}^{\left( *\right) }\right) =0\). We assume \(\ell (\varvec{p})\) is locally concave at \(\varvec{p}^{\left( *\right) }\) and assume g to be Lipschitz continuous, viz. there exists a positive constant L such that, for all pairs of \(\left( p,q\right) \) in the domain, \(\left| g\left( p\right) -g\left( q\right) \right| \le L\left| p-q\right| \). Then, we have

$$\begin{aligned} p_{i}^{\left( {k+1}\right) }-p_{i}^{\left( *\right) }= & {} \frac{{a_{i}}}{{\frac{{a_{i}}}{{p_{i}^{\left( k\right) }}}-g\left( {p_{i}^{\left( k\right) }}\right) }}-p_{i}^{\left( *\right) }\\= & {} \frac{{{a_{i}}p_{i}^{\left( k\right) }}}{{{a_{i}}-p_{i}^{\left( k\right) }g\left( {p_{i}^{\left( k\right) }}\right) }}-p_{i}^{\left( *\right) }\\= & {} \frac{{{a_{i}}\left( {p_{i}^{\left( k\right) }-p_{i}^{\left( *\right) }}\right) +p_{i}^{\left( *\right) }p_{i}^{\left( k\right) }g\left( {p_{i}^{\left( k\right) }}\right) }}{{{a_{i}}-p_{i}^{\left( k\right) }g\left( {p_{i}^{\left( k\right) }}\right) }}, \end{aligned}$$

and further,

If \(p_{i}^{\left( k\right) }<p_{i}^{\left( *\right) }\), then \(g\left( {p_{i}^{\left( k\right) }}\right) >0\) and . Therefore,

If \(p_{i}^{\left( k\right) }>p_{i}^{\left( *\right) }\), then \(g\left( {p_{i}^{\left( k\right) }}\right) <0\). Therefore, \(\frac{{g\left( {p_{i}^{\left( k\right) }}\right) }}{{p_{i}^{\left( k\right) }-p_{i}^{\left( *\right) }}}<0\) and

$$\begin{aligned} {{a_{i}}+p_{i}^{\left( *\right) }p_{i}^{\left( k\right) }\frac{{g\left( {p_{i}^{\left( k\right) }}\right) }}{{p_{i}^{\left( k\right) }-p_{i}^{\left( *\right) }}}}<a_{i}<{{a_{i}}-p_{i}^{\left( k\right) }g\left( {p_{i}^{\left( k\right) }}\right) }. \end{aligned}$$

In both cases, the numerator is smaller than the denominator, hence \(\left| \frac{{p_{i}^{\left( {k+1}\right) } -p_{i}^{\left( *\right) }}}{{p_{i}^{\left( k\right) } -p_{i}^{\left( *\right) }}}\right| <1\) and the rate of convergence is linear.

Appendix D: Ranking results of the car racing data

See Table 4.

Table 4 NASCAR2002 car racing data: complete ranking results using the Placket–Luce and Bradley–Terry models

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, F., Yin, G. Maximum likelihood estimation for incomplete multinomial data via the weaver algorithm. Stat Comput 28, 1095–1117 (2018). https://doi.org/10.1007/s11222-017-9782-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-017-9782-2

Keywords

Mathematics Subject Classification

Navigation