Model-based Clustering of Count Processes

Ng, Tin Lok James; Murphy, Thomas Brendan

doi:10.1007/s00357-020-09363-4

Model-based Clustering of Count Processes

Published: 02 July 2020

Volume 38, pages 188–211, (2021)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Tin Lok James Ng¹ &
Thomas Brendan Murphy²

598 Accesses
3 Citations
Explore all metrics

Abstract

A model-based clustering method based on Gaussian Cox process is proposed to address the problem of clustering of count process data. The model allows for nonparametric estimation of intensity functions of Poisson processes, while simultaneous clustering count process observations. A logistic Gaussian process transformation is imposed on the intensity functions to enforce smoothness. Maximum likelihood parameter estimation is carried out via the EM algorithm, while model selection is addressed using a cross-validated likelihood approach. The proposed model and methodology are applied to two datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model based clustering of multinomial count data

Article Open access 05 July 2023

Model selection and application to high-dimensional count data clustering

Article 13 November 2018

On Discrete Mixture of XLindley Distribution Using Lagrangian Probability Model: Properties and Applications in Count Data with Excess Zeros

Article 05 August 2023

Notes

Historical data available at https://www.capitalbikeshare.com/
Data obtained from http://sociograph.blogspot.ie/2011/04/communication-networks-part-2-mit.html

References

Abraham, C., Cornillon, P. A., Matzner-Løber, E., & Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scand. J. Statist., 30, 581–595.
Article MathSciNet Google Scholar
Adams, R. P., Murray, I., & MacKay, D. J. C. (2009). Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In Proceedings of the 26th annual international conference on machine learning (pp. 9–16).
Basu, S., & Dassios, A. (2002). A Cox process with log-normal intensity. Insurance sMath. Econom., 31, 297–302.
Article MathSciNet Google Scholar
Baudry, J. -P., Raftery, A. E., Celeux, G., Lo, K., & Gottardo, R. (2010). Combining mixture components for clustering. J. Comput. Graph. Statist., 19, 332–353.
Article MathSciNet Google Scholar
Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.
MathSciNet MATH Google Scholar
Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Comput. Statist. Data Anal., 71, 52–78.
Article MathSciNet Google Scholar
Bouveyron, C., Celeux, G., Murphy, T.B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: with applications in R. Cambridge: Cambridge University Press.
Book Google Scholar
Chen, J. (2017). Consistency of the MLE under mixture models. Statist Sci., 32, 47–63.
MathSciNet MATH Google Scholar
Chen, J., Li, S., & Tan, X. (2016). Consistency of the penalized MLE for two-parameter gamma mixture models. Sci China Math., 59, 2301–2318.
Article MathSciNet Google Scholar
Chen, J., Tan, X., & Zhang, R. (2008). Inference for normal mixtures in mean and variance. Statist Sinica, 18, 443–465.
MathSciNet MATH Google Scholar
Cheng, R. C. H., & Liu, W. B. (2001). The consistency of estimators in finite mixture models. Scand J. Statist., 28, 603–616.
Article MathSciNet Google Scholar
Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Scand J. Statist., 30, 45–59.
Article MathSciNet Google Scholar
Côme, E., & Latifa, O. (2014). Model-based count series clustering for Bike-sharing system usage mining, a case study with the Velib’ system of Paris,́ ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Urban Computing, 5.
Cunningham, J. P., Shenoy, K. V., & Sahani, M. (2008). Fast Gaussian process methods for point process intensity estimation. In Proceedings of the 25th international conference on machine learning (pp. 192–199).
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39, 1–38. with discussion.
MathSciNet MATH Google Scholar
Eagle, N., & Pentland, A. (2006). Reality mining: sensing complex social systems. Pers Ubiquitous Comput., 10, 255–268.
Article Google Scholar
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc., 97, 611–631.
Article MathSciNet Google Scholar
Fraley, C., & Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24, 155–181.
Article MathSciNet Google Scholar
Giacofci, M., Lambert-Lacroix, S., Marot, G., & Picard, F. (2013). Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics, 69, 31–40.
Article MathSciNet Google Scholar
Heikkinen, J., & Arjas, E. (1999). Modeling a Poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics, 55, 738–745.
Article Google Scholar
Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Comput Statist. Data Anal., 71, 92–106.
Article MathSciNet Google Scholar
Kiefer, J., & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist., 27, 887–906.
Article MathSciNet Google Scholar
Kim, H., & Ghahramani, Z. (2006). Bayesian Gaussian process classification with the EM-EP algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1948–1959.
Article Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence - vol. 2, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., IJCAI’95, pp. 1137–1143.
Lenk, P. J. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. J. Amer. Statist. Assoc., 83, 509–516.
Article MathSciNet Google Scholar
Lenk, P. J. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika, 78, 531–543.
Article MathSciNet Google Scholar
Leonard, T. (1978). Density estimation, stochastic processes and prior information. J. Roy. Statist. Soc. Ser. B, 40, 113–146. with discussion.
MathSciNet MATH Google Scholar
Lloyd, C., Gunter, T., Osborne, M. A., & Roberts, S. J. (2015). Variational inference for Gaussian process modulated Poisson processes. In Proceedings of the 32nd international conference on international conference on machine learning -, (Vol. 37 pp. 1814–1822).
McNicholas, P.D. (2016). Mixture model-based classification. Boca Raton: CRC Press.
Book Google Scholar
Murray, I., MacKay, D., & Adams, R. P. (2009). The Gaussian process density sampler. In Koller, D., Schuurmans, D., Bengio, Y., & Bottou, L. (Eds.) Advances in neural information processing systems 21 (pp. 9–16): Curran Associates, Inc.
O’Hagan, A., Murphy, T. B., Scrucca, L., & Gormley, I. C. (2019). Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap. Comput Stat., 34, 1779–1813.
Article MathSciNet Google Scholar
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning, adaptive computation and machine learning. MIT Press: Cambridge.
MATH Google Scholar
Sapatinas, T. (1995). Identifiability of mixtures of power-series distributions and related characterizations. Ann. Inst. Statist. Math., 47, 447–459.
MathSciNet MATH Google Scholar
Smyth, P. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Stat Comp., 10, 63–72.
Article Google Scholar
Tokdar, S. T., & Ghosh, J. K. (2007). Posterior consistency of logistic Gaussian process priors in density estimation. J. Statist. Plann. Inference, 137, 34–42.
Article MathSciNet Google Scholar
Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. In Advances in neural information processing systems 8 (pp. 514–520). Cambridge: MIT Press.
Zhang, Y., & Yang, Y. (2015). Cross-validation for selecting a model selection procedure. J. Econom., 187, 95–112.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia
Tin Lok James Ng
School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
Thomas Brendan Murphy

Authors

Tin Lok James Ng
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brendan Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tin Lok James Ng.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Consistency of Penalized MLE

Sufficient conditions for $\hat {P}$ to be strongly consistent are given in Kiefer and Wolfowitz (1956) and are stated below.

(C1) Identifiability: Let H(y_i; α_i, P) be the cumulative distribution function of h(y_i; α_i, P). If for any α_i, H(y_i; α_i, P) = H(y_i; α_i, P^∗) for all y_i, then D(P, P^∗) = 0.
(C2) Continuity: The component parameter space $\mathcal {F}$ is a closed set. For all y_i, α_i and any given P₀,
$$ \begin{array}{@{}rcl@{}} \lim_{P \rightarrow P_{0}} h(y_{i};{\alpha}_{i},P) = h(y_{i};{\alpha}_{i},P_{0}) {.} \end{array} $$
(C3) Finite Kullback–Leibler Information: For any P≠P^∗, there exists an 𝜖 > 0 such that
$$ \begin{array}{@{}rcl@{}} E^{*}[ \log(h(Y_{i}; {\alpha}_{i}, B_{\epsilon}(P) ) / h(Y_{i}; {\alpha}_{i}, P^{*}) ) ]^{+} < \infty { ,} \end{array} $$

where B_𝜖(P) is the open ball of radius 𝜖 > 0 centered at P with respect to the metric D, and
$$ \begin{array}{@{}rcl@{}} h(Y_{i}; {\alpha}_{i}, B_{\epsilon}(P)) = \sup_{\tilde{P} \in B_{\epsilon}(P) } h(Y_{i}; {\alpha}_{i}, \tilde{P}) { .} \end{array} $$
(C4) Compactness: The definition of the mixture density h(y_i; α_i, P) in P can be continuously extended to a compact space $ \overline { \mathbb {P} }$.

Identifiability of the mixture model is established in Proposition 3.1. In the case of mixture of Poisson processes, the component parameter space $ \mathcal {F} $ is clearly closed. To compactify the space of mixing distributions $\mathbb { P}$, we notice that the distance between any two mixing distributions is bounded above by:

$$ \begin{array}{@{}rcl@{}} {\int}_{\mathcal{F} } \exp(-|\mathbf{f}|) d \mathbf{f} < \infty {.} \end{array} $$

We can extend $\mathbb {P}$ to $\overline {\mathbb { P}}$ by including all sub-distributions ρP for any ρ ∈ [0, 1) as in Chen (2017).

We now show the continuity of h(y_i; α_i, P) on $ \overline { \mathbb {P} } $. Recall that $D(P_{m},P) \rightarrow 0 $ if and only if $ P_{m} \rightarrow P$ in distribution if only if $ {\int \limits } u(\mathbf {f}) d P_{m}(\mathbf {f}) \rightarrow {\int \limits } u(\mathbf {f}) dP_{0}(\mathbf {f}) $ for all bounded and continuous function u(⋅).

For any given $\mathbf {f}_{0} \in \mathcal {F}$ and α_i > 0, we clearly have $ \lim _{\mathbf {f} \rightarrow \mathbf {f}_{0} } h(y_{i};{\alpha }_{i},\mathbf {f}) = h(y_{i};{\alpha }_{i},\mathbf {f}_{0}) $, and $ \lim _{| \mathbf {f} | \rightarrow \infty } h(y_{i}; {\alpha }_{i}, \mathbf {f} ) = 0$. Hence, h(y_i; α_i, f) is continuous and bounded on $ \mathcal {F} $. Suppose that $P_{m} \rightarrow P_{0} \in \overline {\mathbb {P}}$ in distribution, we have that:

$$ \begin{array}{@{}rcl@{}} h(y_{i};{\alpha}_{i},P_{m}) = \int h(y_{i};{\alpha}_{i},\mathbf{f}) dP_{m}(\mathbf{f}) \rightarrow \int h(y_{i};{\alpha}_{i},\mathbf{f}) dP_{0}(\mathbf{f}) = h(y_{i};{\alpha}_{i},P_{0}) \end{array} $$

which shows that h(y_i; α_i, P) is continuous in $\mathbb {P}$.

To prove finite Kullback–Leibler information, we need the extra sufficient condition that all scaling factors {α_i}_i are bounded above, that is, $ {\alpha }_{i} < \alpha ^{(M)} < \infty $ almost surely ∀i. Let f₀ be a support point of P^∗. There must be a positive constant δ such that:

$$ \begin{array}{@{}rcl@{}} h(y_{i};{\alpha}_{i},P^{*}) \ge \delta \left( \exp(-{\alpha}_{i} ) \prod\limits_{k=1}^{m} \frac{ ({\alpha}_{i} {\lambda}_{k})^{y_{k}} }{y_{k}!} \right) p(\mathbf{f}_{0}) { .} \end{array} $$

uniformly for all α_i < α^(M). Therefore, we have that:

$$ \begin{array}{@{}rcl@{}} E^{*}(\log h(y_{i};{\alpha}_{i},P^{*} ) )\! \ge \!\sum\limits_{k=1}^{m} \log({\alpha}_{i} {\lambda}_{k}) E^{*}(Y_{i,k}) - \! \sum\limits_{k=1}^{m} E^{*}(\log Y_{i,k}!) - {\alpha}_{i} + \log(\delta) + \log p(\mathbf{f}_{0}) { .} \end{array} $$

Since α_i < α^(M) almost surely, both E^∗(Y_{i, k}) and $E^{*}(\log (Y_{i,k}!))$ are finite for all k. Hence, $ E^{*}(\log h(y_{i};{\alpha }_{i},P^{*}) ) > - \infty $. Since h(y_i; α_i, P^∗) is bounded from above, $ E^{*}(\log h(y_{i};{\alpha }_{i},P^{*}) ) < \infty $. Therefore, $ E^{*}|\log h(y_{i};{\alpha }_{i},P^{*}) | < \infty $.

Since $h(Y_{i};{\alpha }_{i},B_{\epsilon (P)}) < c < \infty $ for any P, 𝜖 and α_i,

$$ \begin{array}{@{}rcl@{}} E^{*}[ \log(h(Y_{i}; {\alpha}_{i}, B_{\epsilon}(P) ) / h(Y_{i};{\alpha}_{i}, P^{*}) ) ]^{+} < \log(c) - E^{*}(\log h(Y_{i};{\alpha}_{i}, P^{*})) < \infty { .} \end{array} $$

Therefore, we have shown that the four conditions above are satisfied and the penalized MLE under mixture of Poisson process is strongly consistent.

Appendix B: Derivation of EM Algorithm

Recall the complete data log-likelihood defined in Eq. 10. The E-step requires the computation of the expected value of the complete data log-likelihood function with respect to the conditional distribution of Z given x and current estimates of parameters $\hat {\theta }^{(t)}$. This conditional expression can be expressed as:

$$ \begin{array}{@{}rcl@{}} E_{Z|x,\hat{\theta}^{(t)}}(\log L(\theta;x,Z)) &=& \sum\limits_{i=1}^{n} \sum\limits_{g=1}^{G} \hat{\pi}_{i}^{g} \left[ \log(\tau_{g}) - {\alpha}_{i} + \sum\limits_{j=1}^{n_{i}} \log({\alpha}_{i}) + \log({\lambda}_{g}(x_{ij})) \right] \\ &&+ {\sum}_{g=1}^{G} \log(p(\mathbf{f}_{g})) \end{array} $$

(16)

The M-step finds the parameters θ that maximize the expression above. It is straightforward to show (by differentiation) that the values of α and τ that maximize the expression above are given by Eqs. 12 and 13 respectively.

To optimize Eq. 16 with respect to f_g, we write f_g = (f_g,1,⋯ , f_{g, m})^T and retain terms that involve f_g for g = 1,⋯ , G to obtain:

$$ \begin{array}{@{}rcl@{}} Q & \equiv & \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} \sum\limits_{j=1}^{n_{i}} \log({\lambda}_{g}(x_{ij})) + \log(p(\mathbf{f}_{g}) ) \\ & = & \sum\limits_{i=1}^{n} {\pi_{i}^{g}} \sum\limits_{j=1}^{n_{i}} \log \left[ \frac{ \exp(f_{g}(x_{i,j}) ) }{ {\sum}_{k=1}^{m} \exp(f_{g,k}) } \right] - \frac{1}{2} \mathbf{f}_{g}^{T} K^{-1} \mathbf{f}_{g} + const \\ & = & \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} \sum\limits_{j=1}^{n_{i}} f_{g}(x_{i,j}) - \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} n_{i} \log\left[ \sum\limits_{k=1}^{m} \exp(f_{g,k}) \right]- \frac{1}{2} \mathbf{f}_{g}^{T} K^{-1} \mathbf{f}_{g} + const \\ & = & \mathbf{b}^{T} \mathbf{f}_{g} - \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} n_{i} \log\left[ \sum\limits_{k=1}^{m} \exp(f_{g,k}) \right]- \frac{1}{2} \mathbf{f}_{g}^{T} K^{-1} \mathbf{f}_{g} + const \end{array} $$

where b = (b₁,⋯ , b_m)^T with y_k defined below.

$$ \begin{array}{@{}rcl@{}} b_{k} = \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} \sum\limits_{j=1}^{n_{i}} I(x_{ij} \in [ s_{k-1}, s_{k} )) \end{array} $$

where [s_k− 1, s_k) is the k th interval of the m equally spaced intervals of [0, T).

For g = 1,⋯ , G, we now optimize Q with respect to f_g, and as analytical expression does not exist, we use Newton’s method. For each g, the Jacobian J_g(Q) of Q can be written as:

$$ \begin{array}{@{}rcl@{}} J_{g}(Q) = \mathbf{y} - \mathbf{u}_{g}\sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} n_{i} - \frac{1}{2}\mathbf{f}_{g}^{T}(K^{-1}+K^{-T}) \end{array} $$

where u_g = (u_g,1,⋯ , u_{g, m})^T with

$$ \begin{array}{@{}rcl@{}} u_{g,l} = \frac{\exp(f_{g,l})}{{\sum}_{k=1}^{m} \exp(f_{g,k})} \end{array} $$

for l = 1,⋯ , m. The Hessian H_g(Q) can be written as:

$$ \begin{array}{@{}rcl@{}} H_{g}(Q) = - (\text{diag}(\mathbf{u}_{g}) - \mathbf{u}_{g} \mathbf{u}_{g}^{T}) \sum\limits_{i=1}^{n} {\pi_{i}^{g}} n_{i} - \frac{1}{2}(K^{-1}+K^{-T}) \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ng, T.L.J., Murphy, T.B. Model-based Clustering of Count Processes. J Classif 38, 188–211 (2021). https://doi.org/10.1007/s00357-020-09363-4

Download citation

Published: 02 July 2020
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00357-020-09363-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based Clustering of Count Processes

Abstract

Access this article

Similar content being viewed by others

Model based clustering of multinomial count data

Model selection and application to high-dimensional count data clustering

On Discrete Mixture of XLindley Distribution Using Lagrangian Probability Model: Properties and Applications in Count Data with Excess Zeros

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: Consistency of Penalized MLE

Appendix B: Derivation of EM Algorithm

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Model-based Clustering of Count Processes

Abstract

Access this article

Similar content being viewed by others

Model based clustering of multinomial count data

Model selection and application to high-dimensional count data clustering

On Discrete Mixture of XLindley Distribution Using Lagrangian Probability Model: Properties and Applications in Count Data with Excess Zeros

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: Consistency of Penalized MLE

Appendix B: Derivation of EM Algorithm

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation