Skip to main content
Log in

Efficient EM-variational inference for nonparametric Hawkes process

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The classic Hawkes process assumes the baseline intensity to be constant and the triggering kernel to be a parametric function. Differently, we present a generalization of the parametric Hawkes process by using a Bayesian nonparametric model called quadratic Gaussian Hawkes process. We model the baseline intensity and trigger kernel as the quadratic transformation of random trajectories drawn from a Gaussian process (GP) prior. We derive an analytical expression for the EM-variational inference algorithm by augmenting the latent branching structure of the Hawkes process to embed the variational Gaussian approximation into the EM framework naturally. We also use a series of schemes based on the sparse GP approximation to accelerate the inference algorithm. The results of synthetic and real data experiments show that the underlying baseline intensity and triggering kernel can be recovered efficiently and our model achieved superior performance in fitting capability and prediction accuracy compared to the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95.

  2. https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb.

References

  • Adams, R.P., Murray, I., MacKay, D.J.: Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 9–16. ACM (2009)

  • Bacry, E., Muzy, J.F.: First-and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Trans. Inf. Theory 62(4), 2184–2202 (2016)

    Article  MathSciNet  Google Scholar 

  • Bacry, E., Mastromatteo, I., Muzy, J.F.: Hawkes processes in finance. Mark. Microstruct. Liq. 1(01), 1550005 (2015)

    Article  Google Scholar 

  • Bishop, C.: Pattern recognition and machine learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn. Springer, New York (2007)

  • Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

    MathSciNet  Google Scholar 

  • Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Probability and Its Applications, vol. I. Springer, Berlin (2003)

    MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  • Eichler, M., Dahlhaus, R., Dueck, J.: Graphical modeling for multivariate Hawkes processes with nonparametric link functions. J. Time Ser. Anal. 38(2), 225–242 (2017)

    Article  MathSciNet  Google Scholar 

  • Flaxman, S., Teh, Y.W., Sejdinovic, D., et al.: Poisson intensity estimation with reproducing kernels. Electron. J. Stat. 11(2), 5081–5104 (2017)

    Article  MathSciNet  Google Scholar 

  • Hawkes, A.G.: Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1), 83–90 (1971)

    Article  MathSciNet  Google Scholar 

  • Hewlett, P.: Clustering of order arrivals, price impact and trade path optimisation. In: Workshop on Financial Modeling with Jump processes, Ecole Polytechnique, pp 6–8 (2006)

  • Lewis, E., Mohler, G.: A nonparametric EM algorithm for multiscale Hawkes processes. J. Nonparametr. Stat. 1(1), 1–20 (2011)

    Google Scholar 

  • Lloyd, C., Gunter, T., Osborne, M., Roberts, S.: Variational inference for Gaussian process modulated Poisson processes. In: International Conference on Machine Learning, pp. 1814–1822 (2015)

  • Marsan, D., Lengline, O.: Extending earthquakes’ reach through cascading. Science 319(5866), 1076–1079 (2008)

    Article  Google Scholar 

  • Mei, H., Eisner, J.M.: The neural Hawkes process: a neurally self-modulating multivariate point process. In: Advances in Neural Information Processing Systems, pp. 6754–6764 (2017)

  • Mohler, G.O., Short, M.B., Brantingham, P.J., Schoenberg, F.P., Tita, G.E.: Self-exciting point process modeling of crime. J. Am. Stat. Assoc. 106(493), 100–108 (2011)

    Article  MathSciNet  Google Scholar 

  • Ogata, Y.: Space-time point-process models for earthquake occurrences. Ann. Inst. Stat. Math. 50(2), 379–402 (1998)

    Article  Google Scholar 

  • Opper, M., Archambeau, C.: The variational Gaussian approximation revisited. Neural Comput. 21(3), 786–792 (2009)

    Article  MathSciNet  Google Scholar 

  • Reynaud-Bouret, P., Schbath, S., et al.: Adaptive estimation for Hawkes processes; application to genome analysis. Ann. Stat. 38(5), 2781–2822 (2010)

    Article  MathSciNet  Google Scholar 

  • Rizoiu, M.A., Mishra, S., Kong, Q., Carman, M., Xie, L.: SIR-Hawkes: linking epidemic models and Hawkes processes to model diffusions in finite populations. In: Proceedings of the 2018 World Wide Web Conference, pp. 419–428 (2018)

  • Samo, Y.L.K., Roberts, S,: Scalable nonparametric Bayesian inference on point processes with Gaussian processes. In: International Conference on Machine Learning, pp. 2227–2236 (2015)

  • Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: Artificial Intelligence and Statistics, pp. 567–574 (2009)

  • Wheatley, S., Schatz, M., Sornette, D.: The ARMA point process and its estimation. arXiv preprint arXiv:1806.09948 (2018)

  • Zhang, R., Walder, C., Rizoiu, M.A.: Variational inference for sparse Gaussian process modulated Hawkes process. arXiv preprint arXiv:1905.10496v2 (2019)

  • Zhou, K., Zha, H., Song, L.: Learning triggering kernels for multi-dimensional Hawkes processes. In: International Conference on Machine Learning, pp. 1301–1309 (2013)

  • Zhou, F., Li, Z., Fan, X., Wang, Y., Sowmya, A., Chen, F.: A refined MISD algorithm based on Gaussian process regression. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 584–596. Springer (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhidong Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

Proof of lower-bound

The lower-bound \(Q(\mu (t),\phi (\tau )|\mu ^{(s)}(t),\phi ^{(s)}(\tau ))\) in Eq. (4) is induced as follows. Based on Jensen’s inequality, we have

$$\begin{aligned}&\log p(D|\mu (t), \phi (\tau ))=\sum _{i=1}^N \log \left( \mu (t_i)+\sum _{j=1}^{i-1}\phi (t_i-t_j)\right) \nonumber \\&\qquad -\int _0^T\left( \mu (t)+\sum _{t_i<t}\phi (t-t_i)\right) dt\nonumber \\&\quad \ge \sum _{i=1}^N\left( p_{ii}\log \frac{\mu (t_i)}{p_{ii}}+\sum _{j=1}^{i-1}p_{ij}\log \frac{\phi (t_i-t_j)}{p_{ij}}\right) \nonumber \\&\qquad -\int _0^T\mu (t)dt-\sum _{i=1}^N\int _{t_i}^{t_i+T_{\phi }}\phi (t-t_i)dt\nonumber \\&\quad =\sum _{i=1}^Np_{ii}\log \mu (t_i)-\int _0^T\mu (t)dt\nonumber \\&\qquad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\log \phi (t_i-t_j)-\sum _{i=1}^N\int _{t_i}^{t_i+T_\phi }\phi (t-t_i)dt+C\nonumber \\ \end{aligned}$$
(13)

where C is a constant because \(p_{ii}\) and \(p_{ij}\) are given in the E-step.

Analytical solution of ELBO

The \(\text {KL}\left( q(\mathbf {u}_f)||p(\mathbf {u}_f)\right) \) can be written as

$$\begin{aligned} \begin{aligned} \text {KL}\left( q(\mathbf {u}_f)||p(\mathbf {u}_f)\right) =&\frac{1}{2}\bigg [\text {Tr}(\mathbf {K}_{z_f z_f}^{-1}\mathbf {S}_f)+\log \frac{|\mathbf {K}_{z_f z_f}|}{|\mathbf {S}_f|}\\&-M_f+\mathbf {m}_f^T\mathbf {K}_{z_f z_f}^{-1}\mathbf {m}_f\bigg ], \end{aligned} \end{aligned}$$
(14)

where \(\text {Tr}(\cdot )\) means trace, \(|\cdot |\) means determinant and \(M_f\) is the dimensionality of \(\mathbf {u}_f\).

The last two terms in Eq. (8) have analytical solutions (Lloyd et al. 2015)

$$\begin{aligned} \int _0^T\mathbb {E}^2_{q(f)}[f(t)]dt=\mathbf {m}_f^T\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f\mathbf {K}^{-1}_{z_f z_f}\mathbf {m}_f, \end{aligned}$$
(15)
$$\begin{aligned} \begin{aligned} \int _0^T\text {Var}_{q(f)}[f(t)]dt=&\theta _0^fT-\text {Tr}(\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f)+\\&\text {Tr}(\mathbf {K}_{z_f z_f}^{-1}\mathbf {S}_f\mathbf {K}_{z_f z_f}^{-1}\varvec{\Psi }_f), \end{aligned} \end{aligned}$$
(16)

where \(\varPsi _f(z_f,z_f^\prime )=\int _0^Tk(z_f,t)k(t,z_f^\prime )dt\). For the squared exponential kernel, \(\varPsi _f\) can be written as (Lloyd et al. 2015)

$$\begin{aligned} \varPsi _f(z_f,z_f^\prime )=&-\frac{(\theta _0^f)^2}{2}\sqrt{\frac{\pi }{\theta _1^f}}\exp \left( -\frac{\theta _1^f(z_f-z_f^\prime )^2}{4}\right) \nonumber \\&\left[ \text {erf}\left( \sqrt{\theta _1^f}(\bar{z}_f-T)\right) -\text {erf}\left( \sqrt{\theta _1^f}\bar{z}_f\right) \right] ,\nonumber \\ \end{aligned}$$
(17)

where \(\text {erf}(\cdot )\) is Gauss error function and \(\bar{z}_f=(z_f+z_f^\prime )/2\).

The first term in Eq. (8) also has an analytical solution (Lloyd et al. 2015)

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{q(f)}\left[ \log f^2(t_i)\right] \\&\quad =\int _{-\infty }^\infty \log f^2(t_i)\mathcal {N}(f(t_i)|\tilde{v}_f(t_i),{\tilde{\sigma }}^2_f(t_i))df(t_i)\\&\quad =-\tilde{G}\left( -\frac{\tilde{v}^2_f(t_i)}{2{\tilde{\sigma }}^2_f(t_i)}\right) +\log \left( \frac{{\tilde{\sigma }}^2_f(t_i)}{2}\right) -C, \end{aligned} \end{aligned}$$
(18)

where \({\tilde{\sigma }}^2_f(t_i)\) is the diagonal entry of \({\tilde{\varSigma }}_f(t,t^\prime )\) in Eq. (7) at \(t_i\), C is the Euler-Mascheroni constant 0.57721566 and \(\tilde{G}(z)\) is a special case of the partial derivative of the confluent hyper-geometric function \(_1F_1(a,b,z)\) (Lloyd et al. 2015)

$$\begin{aligned} \tilde{G}(z)={_1F_1}^{(1,0,0)}(0,0.5,z). \end{aligned}$$
(19)

It is worth noting that \(\tilde{G}(z)\) does not need to be computed for inference. Actually we only need to know \(\tilde{G}(0)=0\) because \(\mathbf {m}_f^*=\mathbf {0}\) as we have shown in the section of inference speed up.

Matrix derivative of ELBO

Given \(\mathbf {m}_f=\mathbf {0}\), \(\text {ELBO}_\mu \) can be written as

$$\begin{aligned}&\text {ELBO}_\mu \nonumber \\&\quad =-\left( \theta _0^fT-\text {Tr}(\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f)+\text {Tr}(\mathbf {K}^{-1}_{z_f z_f}\mathbf {S}_f\mathbf {K}^{-1}_{z_f z_f}\varvec{\Psi }_f)\right) \nonumber \\&\qquad +\sum _{i=1}^Np_{ii}\left( -\tilde{G}(0)+\log ({\tilde{\sigma }}^2_f(t_i))-\log 2-C\right) \nonumber \\&\qquad -\frac{1}{2}\left( \text {Tr}(\mathbf {K}_{z_f z_f}^{-1}\mathbf {S}_f)+\log |\mathbf {K}_{z_f z_f}|-\log |\mathbf {S}_f|-M_f\right) .\nonumber \\ \end{aligned}$$
(20)

If \(\mathbf {S}_f\) is symmetric, \(\partial \text {ELBO}_\mu /\partial \mathbf {S}_f\) can be written as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\mu }{\partial \mathbf {S}_f}&=-(2\mathbf {K}_{z_f z_f}^{-1}\varPsi _f\mathbf {K}_{z_f z_f}^{-1}-\mathbf {K}_{z_f z_f}^{-1}\varPsi _f\mathbf {K}_{z_f z_f}^{-1}\circ \mathbf {I})\\&\quad +\sum _{i=1}^Np_{ii}\bigg (2\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\\&\quad -\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}\bigg )/{\tilde{\sigma }}_f^2(t_i)\\&\quad -\frac{1}{2}\left( 2\mathbf {K}^{-1}_{z_f z_f}-\mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}-(2\mathbf {S}_f^{-1}-\mathbf {S}_f^{-1}\circ \mathbf {I})\right) , \end{aligned} \end{aligned}$$
(21)

where \(\mathbf {I}\) denotes the identity matrix, \(\circ \) denotes the Hadamard (elementwise) product and \({\tilde{\sigma }}_f^2(t_i)=\theta _0^f-\mathbf {k}_{t_i z_f}\mathbf {K}_{z_f z_f}^{-1}\mathbf {k}_{z_f t_i}+\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\mathbf {S}_f\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\) is the diagonal entry of \({\tilde{\varSigma }}_f(t,t^\prime )\) in Eq. (7).

If \(\mathbf {S}_f\) is diagonal, \(\partial \text {ELBO}_\mu /\partial \mathbf {S}_f\) can be further simplified as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\mu }{\partial \mathbf {S}_f}&=-\mathbf {K}_{z_f z_f}^{-1}\varPsi _f\mathbf {K}_{z_f z_f}^{-1}\circ \mathbf {I}\\&\quad +\sum _{i=1}^Np_{ii}\frac{\mathbf {K}^{-1}_{z_f z_f}\mathbf {k}_{z_f t_i}\mathbf {k}_{t_i z_f}\mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}}{{\tilde{\sigma }}_f^2(t_i)}\\&\quad -\frac{1}{2}\left( \mathbf {K}^{-1}_{z_f z_f}\circ \mathbf {I}-\mathbf {S}_f^{-1}\right) . \end{aligned} \end{aligned}$$
(22)

Similarly given \(\mathbf {m}_g=\mathbf {0}\), \(\text {ELBO}_\phi \) can be written as

$$\begin{aligned}&\text {ELBO}_\phi \nonumber \\&\quad =-\sum _{i=1}^N\left( \theta _0^gT_\phi -\text {Tr}(\mathbf {K}^{-1}_{z_g z_g}\varvec{\Psi }_g)+\text {Tr}(\mathbf {K}^{-1}_{z_g z_g}\mathbf {S}_g\mathbf {K}^{-1}_{z_g z_g}\varvec{\Psi }_g)\right) \nonumber \\&\qquad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\left( -\tilde{G}(0)+\log ({\tilde{\sigma }}^2_g(\tau _{ij}))-\log 2-C\right) \nonumber \\&\qquad -\frac{1}{2}\left( \text {Tr}(\mathbf {K}_{z_g z_g}^{-1}\mathbf {S}_g)+\log |\mathbf {K}_{z_g z_g}|-\log |\mathbf {S}_g|-M_g\right) .\nonumber \\ \end{aligned}$$
(23)

If \(\mathbf {S}_g\) is symmetric, \(\partial \text {ELBO}_\phi /\partial \mathbf {S}_g\) can be written as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\phi }{\partial \mathbf {S}_g}&=-\sum _{i=1}^N(2\mathbf {K}_{z_g z_g}^{-1}\varPsi _g\mathbf {K}_{z_g z_g}^{-1}-\mathbf {K}_{z_g z_g}^{-1}\varPsi _g\mathbf {K}_{z_g z_g}^{-1}\circ \mathbf {I})\\&\quad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\bigg (2\mathbf {K}^{-1}_{z_g z_g}\mathbf {k}_{z_g \tau _{ij}}\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\\&\quad -\mathbf {K}^{-1}_{z_g z_g}\mathbf {k}_{z_g \tau _{ij}}\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}\bigg )/{\tilde{\sigma }}_g^2(\tau _{ij})\\&\quad -\frac{1}{2}\left( 2\mathbf {K}^{-1}_{z_g z_g}-\mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}-(2\mathbf {S}_g^{-1}-\mathbf {S}_g^{-1}\circ \mathbf {I})\right) , \end{aligned} \end{aligned}$$
(24)

where \({\tilde{\sigma }}_g^2(\tau _{ij})=\theta _0^g-\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}_{z_g z_g}^{-1}\mathbf {k}_{z_g \tau _{ij}}+\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\mathbf {S}_g\mathbf {K}^{-1}_{z_g z_g} \mathbf {k}_{z_g \tau _{ij}}\) is the diagonal entry of \({\tilde{\varSigma }}_g(\tau ,\tau ^\prime )\).

If \(\mathbf {S}_g\) is diagonal, \(\partial \text {ELBO}_\phi /\partial \mathbf {S}_g\) can be further simplified as

$$\begin{aligned} \begin{aligned} \frac{\partial \text {ELBO}_\phi }{\partial \mathbf {S}_g}&=-\sum _{i=1}^N\mathbf {K}_{z_g z_g}^{-1}\varPsi _g\mathbf {K}_{z_g z_g}^{-1}\circ \mathbf {I}\\&\quad +\sum _{i=2}^N\sum _{j=1}^{i-1}p_{ij}\frac{\mathbf {K}^{-1}_{z_g z_g}\mathbf {k}_{z_g \tau _{ij}}\mathbf {k}_{\tau _{ij} z_g}\mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}}{{\tilde{\sigma }}_g^2(\tau _{ij})}\\&\quad -\frac{1}{2}\left( \mathbf {K}^{-1}_{z_g z_g}\circ \mathbf {I}-\mathbf {S}_g^{-1}\right) . \end{aligned} \end{aligned}$$
(25)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, F., Luo, S., Li, Z. et al. Efficient EM-variational inference for nonparametric Hawkes process. Stat Comput 31, 46 (2021). https://doi.org/10.1007/s11222-021-10021-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10021-x

Keywords

Navigation