Skip to main content
Log in

Model-based Clustering of Count Processes

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

A model-based clustering method based on Gaussian Cox process is proposed to address the problem of clustering of count process data. The model allows for nonparametric estimation of intensity functions of Poisson processes, while simultaneous clustering count process observations. A logistic Gaussian process transformation is imposed on the intensity functions to enforce smoothness. Maximum likelihood parameter estimation is carried out via the EM algorithm, while model selection is addressed using a cross-validated likelihood approach. The proposed model and methodology are applied to two datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Historical data available at https://www.capitalbikeshare.com/

  2. Data obtained from http://sociograph.blogspot.ie/2011/04/communication-networks-part-2-mit.html

References

  • Abraham, C., Cornillon, P. A., Matzner-Løber, E., & Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scand. J. Statist., 30, 581–595.

    Article  MathSciNet  Google Scholar 

  • Adams, R. P., Murray, I., & MacKay, D. J. C. (2009). Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In Proceedings of the 26th annual international conference on machine learning (pp. 9–16).

  • Basu, S., & Dassios, A. (2002). A Cox process with log-normal intensity. Insurance sMath. Econom., 31, 297–302.

    Article  MathSciNet  Google Scholar 

  • Baudry, J. -P., Raftery, A. E., Celeux, G., Lo, K., & Gottardo, R. (2010). Combining mixture components for clustering. J. Comput. Graph. Statist., 19, 332–353.

    Article  MathSciNet  Google Scholar 

  • Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.

    MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Comput. Statist. Data Anal., 71, 52–78.

    Article  MathSciNet  Google Scholar 

  • Bouveyron, C., Celeux, G., Murphy, T.B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: with applications in R. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Chen, J. (2017). Consistency of the MLE under mixture models. Statist Sci., 32, 47–63.

    MathSciNet  MATH  Google Scholar 

  • Chen, J., Li, S., & Tan, X. (2016). Consistency of the penalized MLE for two-parameter gamma mixture models. Sci China Math., 59, 2301–2318.

    Article  MathSciNet  Google Scholar 

  • Chen, J., Tan, X., & Zhang, R. (2008). Inference for normal mixtures in mean and variance. Statist Sinica, 18, 443–465.

    MathSciNet  MATH  Google Scholar 

  • Cheng, R. C. H., & Liu, W. B. (2001). The consistency of estimators in finite mixture models. Scand J. Statist., 28, 603–616.

    Article  MathSciNet  Google Scholar 

  • Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Scand J. Statist., 30, 45–59.

    Article  MathSciNet  Google Scholar 

  • Côme, E., & Latifa, O. (2014). Model-based count series clustering for Bike-sharing system usage mining, a case study with the Velib’ system of Paris,́ ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Urban Computing, 5.

  • Cunningham, J. P., Shenoy, K. V., & Sahani, M. (2008). Fast Gaussian process methods for point process intensity estimation. In Proceedings of the 25th international conference on machine learning (pp. 192–199).

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39, 1–38. with discussion.

    MathSciNet  MATH  Google Scholar 

  • Eagle, N., & Pentland, A. (2006). Reality mining: sensing complex social systems. Pers Ubiquitous Comput., 10, 255–268.

    Article  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc., 97, 611–631.

    Article  MathSciNet  Google Scholar 

  • Fraley, C., & Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24, 155–181.

    Article  MathSciNet  Google Scholar 

  • Giacofci, M., Lambert-Lacroix, S., Marot, G., & Picard, F. (2013). Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics, 69, 31–40.

    Article  MathSciNet  Google Scholar 

  • Heikkinen, J., & Arjas, E. (1999). Modeling a Poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics, 55, 738–745.

    Article  Google Scholar 

  • Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Comput Statist. Data Anal., 71, 92–106.

    Article  MathSciNet  Google Scholar 

  • Kiefer, J., & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist., 27, 887–906.

    Article  MathSciNet  Google Scholar 

  • Kim, H., & Ghahramani, Z. (2006). Bayesian Gaussian process classification with the EM-EP algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1948–1959.

    Article  Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence - vol. 2, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., IJCAI’95, pp. 1137–1143.

  • Lenk, P. J. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. J. Amer. Statist. Assoc., 83, 509–516.

    Article  MathSciNet  Google Scholar 

  • Lenk, P. J. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika, 78, 531–543.

    Article  MathSciNet  Google Scholar 

  • Leonard, T. (1978). Density estimation, stochastic processes and prior information. J. Roy. Statist. Soc. Ser. B, 40, 113–146. with discussion.

    MathSciNet  MATH  Google Scholar 

  • Lloyd, C., Gunter, T., Osborne, M. A., & Roberts, S. J. (2015). Variational inference for Gaussian process modulated Poisson processes. In Proceedings of the 32nd international conference on international conference on machine learning -, (Vol. 37 pp. 1814–1822).

  • McNicholas, P.D. (2016). Mixture model-based classification. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Murray, I., MacKay, D., & Adams, R. P. (2009). The Gaussian process density sampler. In Koller, D., Schuurmans, D., Bengio, Y., & Bottou, L. (Eds.) Advances in neural information processing systems 21 (pp. 9–16): Curran Associates, Inc.

  • O’Hagan, A., Murphy, T. B., Scrucca, L., & Gormley, I. C. (2019). Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap. Comput Stat., 34, 1779–1813.

    Article  MathSciNet  Google Scholar 

  • Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning, adaptive computation and machine learning. MIT Press: Cambridge.

    MATH  Google Scholar 

  • Sapatinas, T. (1995). Identifiability of mixtures of power-series distributions and related characterizations. Ann. Inst. Statist. Math., 47, 447–459.

    MathSciNet  MATH  Google Scholar 

  • Smyth, P. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Stat Comp., 10, 63–72.

    Article  Google Scholar 

  • Tokdar, S. T., & Ghosh, J. K. (2007). Posterior consistency of logistic Gaussian process priors in density estimation. J. Statist. Plann. Inference, 137, 34–42.

    Article  MathSciNet  Google Scholar 

  • Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. In Advances in neural information processing systems 8 (pp. 514–520). Cambridge: MIT Press.

  • Zhang, Y., & Yang, Y. (2015). Cross-validation for selecting a model selection procedure. J. Econom., 187, 95–112.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tin Lok James Ng.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Consistency of Penalized MLE

Sufficient conditions for \(\hat {P}\) to be strongly consistent are given in Kiefer and Wolfowitz (1956) and are stated below.

  • (C1) Identifiability: Let H(yi; αi, P) be the cumulative distribution function of h(yi; αi, P). If for any αi, H(yi; αi, P) = H(yi; αi, P) for all yi, then D(P, P) = 0.

  • (C2) Continuity: The component parameter space \(\mathcal {F}\) is a closed set. For all yi, αi and any given P0,

    $$ \begin{array}{@{}rcl@{}} \lim_{P \rightarrow P_{0}} h(y_{i};{\alpha}_{i},P) = h(y_{i};{\alpha}_{i},P_{0}) {.} \end{array} $$
  • (C3) Finite Kullback–Leibler Information: For any PP, there exists an 𝜖 > 0 such that

    $$ \begin{array}{@{}rcl@{}} E^{*}[ \log(h(Y_{i}; {\alpha}_{i}, B_{\epsilon}(P) ) / h(Y_{i}; {\alpha}_{i}, P^{*}) ) ]^{+} < \infty { ,} \end{array} $$

    where B𝜖(P) is the open ball of radius 𝜖 > 0 centered at P with respect to the metric D, and

    $$ \begin{array}{@{}rcl@{}} h(Y_{i}; {\alpha}_{i}, B_{\epsilon}(P)) = \sup_{\tilde{P} \in B_{\epsilon}(P) } h(Y_{i}; {\alpha}_{i}, \tilde{P}) { .} \end{array} $$
  • (C4) Compactness: The definition of the mixture density h(yi; αi, P) in P can be continuously extended to a compact space \( \overline { \mathbb {P} }\).

Identifiability of the mixture model is established in Proposition 3.1. In the case of mixture of Poisson processes, the component parameter space \( \mathcal {F} \) is clearly closed. To compactify the space of mixing distributions \(\mathbb { P}\), we notice that the distance between any two mixing distributions is bounded above by:

$$ \begin{array}{@{}rcl@{}} {\int}_{\mathcal{F} } \exp(-|\mathbf{f}|) d \mathbf{f} < \infty {.} \end{array} $$

We can extend \(\mathbb {P}\) to \(\overline {\mathbb { P}}\) by including all sub-distributions ρP for any ρ ∈ [0, 1) as in Chen (2017).

We now show the continuity of h(yi; αi, P) on \( \overline { \mathbb {P} } \). Recall that \(D(P_{m},P) \rightarrow 0 \) if and only if \( P_{m} \rightarrow P\) in distribution if only if \( {\int \limits } u(\mathbf {f}) d P_{m}(\mathbf {f}) \rightarrow {\int \limits } u(\mathbf {f}) dP_{0}(\mathbf {f}) \) for all bounded and continuous function u(⋅).

For any given \(\mathbf {f}_{0} \in \mathcal {F}\) and αi > 0, we clearly have \( \lim _{\mathbf {f} \rightarrow \mathbf {f}_{0} } h(y_{i};{\alpha }_{i},\mathbf {f}) = h(y_{i};{\alpha }_{i},\mathbf {f}_{0}) \), and \( \lim _{| \mathbf {f} | \rightarrow \infty } h(y_{i}; {\alpha }_{i}, \mathbf {f} ) = 0\). Hence, h(yi; αi, f) is continuous and bounded on \( \mathcal {F} \). Suppose that \(P_{m} \rightarrow P_{0} \in \overline {\mathbb {P}}\) in distribution, we have that:

$$ \begin{array}{@{}rcl@{}} h(y_{i};{\alpha}_{i},P_{m}) = \int h(y_{i};{\alpha}_{i},\mathbf{f}) dP_{m}(\mathbf{f}) \rightarrow \int h(y_{i};{\alpha}_{i},\mathbf{f}) dP_{0}(\mathbf{f}) = h(y_{i};{\alpha}_{i},P_{0}) \end{array} $$

which shows that h(yi; αi, P) is continuous in \(\mathbb {P}\).

To prove finite Kullback–Leibler information, we need the extra sufficient condition that all scaling factors {αi}i are bounded above, that is, \( {\alpha }_{i} < \alpha ^{(M)} < \infty \) almost surely ∀i. Let f0 be a support point of P. There must be a positive constant δ such that:

$$ \begin{array}{@{}rcl@{}} h(y_{i};{\alpha}_{i},P^{*}) \ge \delta \left( \exp(-{\alpha}_{i} ) \prod\limits_{k=1}^{m} \frac{ ({\alpha}_{i} {\lambda}_{k})^{y_{k}} }{y_{k}!} \right) p(\mathbf{f}_{0}) { .} \end{array} $$

uniformly for all αi < α(M). Therefore, we have that:

$$ \begin{array}{@{}rcl@{}} E^{*}(\log h(y_{i};{\alpha}_{i},P^{*} ) )\! \ge \!\sum\limits_{k=1}^{m} \log({\alpha}_{i} {\lambda}_{k}) E^{*}(Y_{i,k}) - \! \sum\limits_{k=1}^{m} E^{*}(\log Y_{i,k}!) - {\alpha}_{i} + \log(\delta) + \log p(\mathbf{f}_{0}) { .} \end{array} $$

Since αi < α(M) almost surely, both E(Yi, k) and \(E^{*}(\log (Y_{i,k}!))\) are finite for all k. Hence, \( E^{*}(\log h(y_{i};{\alpha }_{i},P^{*}) ) > - \infty \). Since h(yi; αi, P) is bounded from above, \( E^{*}(\log h(y_{i};{\alpha }_{i},P^{*}) ) < \infty \). Therefore, \( E^{*}|\log h(y_{i};{\alpha }_{i},P^{*}) | < \infty \).

Since \(h(Y_{i};{\alpha }_{i},B_{\epsilon (P)}) < c < \infty \) for any P, 𝜖 and αi,

$$ \begin{array}{@{}rcl@{}} E^{*}[ \log(h(Y_{i}; {\alpha}_{i}, B_{\epsilon}(P) ) / h(Y_{i};{\alpha}_{i}, P^{*}) ) ]^{+} < \log(c) - E^{*}(\log h(Y_{i};{\alpha}_{i}, P^{*})) < \infty { .} \end{array} $$

Therefore, we have shown that the four conditions above are satisfied and the penalized MLE under mixture of Poisson process is strongly consistent.

Appendix B: Derivation of EM Algorithm

Recall the complete data log-likelihood defined in Eq. 10. The E-step requires the computation of the expected value of the complete data log-likelihood function with respect to the conditional distribution of Z given x and current estimates of parameters \(\hat {\theta }^{(t)}\). This conditional expression can be expressed as:

$$ \begin{array}{@{}rcl@{}} E_{Z|x,\hat{\theta}^{(t)}}(\log L(\theta;x,Z)) &=& \sum\limits_{i=1}^{n} \sum\limits_{g=1}^{G} \hat{\pi}_{i}^{g} \left[ \log(\tau_{g}) - {\alpha}_{i} + \sum\limits_{j=1}^{n_{i}} \log({\alpha}_{i}) + \log({\lambda}_{g}(x_{ij})) \right] \\ &&+ {\sum}_{g=1}^{G} \log(p(\mathbf{f}_{g})) \end{array} $$
(16)

The M-step finds the parameters θ that maximize the expression above. It is straightforward to show (by differentiation) that the values of α and τ that maximize the expression above are given by Eqs. 12 and 13 respectively.

To optimize Eq. 16 with respect to fg, we write fg = (fg,1,⋯ , fg, m)T and retain terms that involve fg for g = 1,⋯ , G to obtain:

$$ \begin{array}{@{}rcl@{}} Q & \equiv & \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} \sum\limits_{j=1}^{n_{i}} \log({\lambda}_{g}(x_{ij})) + \log(p(\mathbf{f}_{g}) ) \\ & = & \sum\limits_{i=1}^{n} {\pi_{i}^{g}} \sum\limits_{j=1}^{n_{i}} \log \left[ \frac{ \exp(f_{g}(x_{i,j}) ) }{ {\sum}_{k=1}^{m} \exp(f_{g,k}) } \right] - \frac{1}{2} \mathbf{f}_{g}^{T} K^{-1} \mathbf{f}_{g} + const \\ & = & \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} \sum\limits_{j=1}^{n_{i}} f_{g}(x_{i,j}) - \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} n_{i} \log\left[ \sum\limits_{k=1}^{m} \exp(f_{g,k}) \right]- \frac{1}{2} \mathbf{f}_{g}^{T} K^{-1} \mathbf{f}_{g} + const \\ & = & \mathbf{b}^{T} \mathbf{f}_{g} - \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} n_{i} \log\left[ \sum\limits_{k=1}^{m} \exp(f_{g,k}) \right]- \frac{1}{2} \mathbf{f}_{g}^{T} K^{-1} \mathbf{f}_{g} + const \end{array} $$

where b = (b1,⋯ , bm)T with yk defined below.

$$ \begin{array}{@{}rcl@{}} b_{k} = \sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} \sum\limits_{j=1}^{n_{i}} I(x_{ij} \in [ s_{k-1}, s_{k} )) \end{array} $$

where [sk− 1, sk) is the k th interval of the m equally spaced intervals of [0, T).

For g = 1,⋯ , G, we now optimize Q with respect to fg, and as analytical expression does not exist, we use Newton’s method. For each g, the Jacobian Jg(Q) of Q can be written as:

$$ \begin{array}{@{}rcl@{}} J_{g}(Q) = \mathbf{y} - \mathbf{u}_{g}\sum\limits_{i=1}^{n} \hat{\pi}_{i}^{g} n_{i} - \frac{1}{2}\mathbf{f}_{g}^{T}(K^{-1}+K^{-T}) \end{array} $$

where ug = (ug,1,⋯ , ug, m)T with

$$ \begin{array}{@{}rcl@{}} u_{g,l} = \frac{\exp(f_{g,l})}{{\sum}_{k=1}^{m} \exp(f_{g,k})} \end{array} $$

for l = 1,⋯ , m. The Hessian Hg(Q) can be written as:

$$ \begin{array}{@{}rcl@{}} H_{g}(Q) = - (\text{diag}(\mathbf{u}_{g}) - \mathbf{u}_{g} \mathbf{u}_{g}^{T}) \sum\limits_{i=1}^{n} {\pi_{i}^{g}} n_{i} - \frac{1}{2}(K^{-1}+K^{-T}) \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ng, T.L.J., Murphy, T.B. Model-based Clustering of Count Processes. J Classif 38, 188–211 (2021). https://doi.org/10.1007/s00357-020-09363-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-020-09363-4

Keywords

Navigation