Abstract
With the development of the Internet and the formal arrival of the era of big data, people record, store and process data in electronic form, while the bulk of the data in the real life are asynchronous event sequence data. For modeling the asynchronous event sequence, neural point process is one of the most mainstream solutions. With the more and more in-depth study of neural point process, in order to boost the prediction accuracy of the model, the complexity of the model cannot be overestimated, or the selected model itself has more nonlinearity. For example, the neural point process based on attention mechanism will lead to great complexity of the model. Meanwhile, with the development of deep learning, people find that the traditional multi-layer perceptron has great potential. Now many model architectures built with pure multi-layer perceptron without attention have been proposed, and the effect is better than the attention mechanism. Therefore, the multi-layer perceptron has been "reborn" and has attracted extensive attention. Inspired by this, we propose the Linear Normalization Attention Hawkes Process (LNAHP), which substitutes the multi-head dot-product attention from the transformer for linear normalization attention, and learns the hidden representation through two linear transformation layers and normalization operation, which markedly reduces the complexity of the model. The performance for different evaluation metrics for the LNAHP is verified and compared to the current baselines on real datasets from different fields and synthetic datasets, which proves the effectiveness of the LNAHP.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Dubey M, Palakkadavath R, Srijith PK (2021) Bayesian neural Hawkes process for event uncertainty prediction. arXiv preprint arXiv:2112.14474
Bacry E, Dayri K, Muzy JF (2012) Non-parametric kernel estimation for symmetric Hawkes processes. Application to high frequency financial data. Eur Phys J B 85(5):1–12
Aït-Sahalia Y, Cacho-Diaz J, Laeven RJA (2015) Modeling financial contagion using mutually exciting jump processes. J Financ Econ 117(3):585–606
Reynaud-Bouret P, Schbath S (2010) Adaptive estimation for Hawkes processes; application to genome analysis. Ann Stat 38(5):2781–2822
Mohler GO, Short MB, Brantingham PJ et al (2011) Self-exciting point process modeling of crime. J Am Stat Assoc 106(493):100–108
Ogata Y (1999) Seismicity analysis through point-process modeling: a review. Seismicity patterns, their statistical significance and physical meaning. Science 8:471–507
Zhou F, Kong Q, Zhang Y, Feng C, Zhu J (2021) Nonlinear Hawkes processes in time-varying system. arXiv preprint arXiv:2106.04844
Wang L, Zhang W, He X et al. (2018) Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, pp 2447–2456
Zhou K, Zha H, Song L (2013) Learning social infectivity in sparse low-rank networks using multi-dimensional Hawkes processes. Artif Intell Stat PMLR 2013:641–649
Errais E, Giesecke K, Goldberg LR (2010) Affine point processes and portfolio credit risk. SIAM J Financ Math 1(1):642–665
Daley DJ, Vere-Jones D (2003) An introduction to the theory of point processes: volume I: elementary theory and methods. Springer, New York
Cox DR, Isham V (1980) Point processes. CRC Press, London
Lewis PAW (1964) A branching Poisson process model for the analysis of computer failure patterns. J Roy Stat Soc Ser B (Methodol) 26(3):398–441
Hawkes AG (1971) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1):83–90
Liniger TJ (2009) Multivariate Hawkes processes. ETH Zurich, New York
Hewlett P (2006) Clustering of order arrivals, price impact and trade path optimization. Workshop on financial modeling with jump processes. Ecole Polytechnique. 5:6–8
Bacry E, Mastromatteo I, Muzy JF (2015) Hawkes processes in finance. Market Microstruct Liq 1(01):1550005
Embrechts P, Liniger T, Lin L (2011) Multivariate Hawkes processes: an application to financial data. J Appl Probab 48(A):367–378
Large J (2007) Measuring the resiliency of an electronic limit order book. J Financ Mark 10(1):1–25
Gusto G, Schbath S (2005) FADO: a statistical method to detect favored or avoided distances between occurrences of motifs using the Hawkes’ model. Stat Appl Genet Mol Biol 4(1):889
Johnson SD, Bernasco W, Bowers KJ et al (2007) Space–time patterns of risk: a cross national assessment of residential burglary victimization. J Quant Criminol 23(3):201–219
Vere-Jones D, Davies RB (1966) A statistical survey of earthquakes in the main seismic region of New Zealand: part 2—time series analyses. NZ J Geol Geophys 9(3):251–284
Vere-Jones D (1970) Stochastic models for earthquake occurrence. J R Stat Soc Ser B (Methodol) 32(1):1–45
Hu J, Perer A, Wang F (2016) Data driven analytics for personalized healthcare. Healthcare information management systems. Springer, Cham, pp 529–554
Sun L, Liu C, Guo C, et al. (2016) Data-driven automatic treatment regimen development and recommendation. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1865–1874
Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining, pp 241–250
Zhao Q, Erdogdu MA, He HY et al. (2015) Seismic: a self-exciting point process model for predicting tweet popularity. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1513–1522
Kobayashi R, Lambiotte R (2016) Tideh: time-dependent Hawkes process for predicting retweet dynamics. In: Proceedings of the international AAAI conference on web and social media, vol 10, no 1
Zhou K, Zha H, Song L (2013) Learning social infectivity in sparse low-rank networks using multi-dimensional Hawkes processes. Artif Intell Stat PMLR 5:641–649
Myers S, Leskovec J (2010) On the convexity of latent social network inference. Adv Neural Inform Process Syst 23:5566
Giesecke K, Goldberg LR, Ding X (2011) A top-down approach to multiname credit. Oper Res 59(2):283–300
Cryer JD (1986) Time series analysis. Duxbury Press, Boston
Soderland S, Kim G L, Hawkins N (xxxx) A language model for extracting implicit relations
Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349(6245):261–266
Chung J, Gulcehre C, Cho KH et al. (2010) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Johnson R, Zhang T (2015) Semi-supervised convolutional neural networks for text categorization via region embedding. Adv Neural Inform Process Syst 28:888
Nguyen TH, Grishman R (2015) Relation extraction: perspective from convolutional neural networks. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 39–48
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inform Process Syst 2:30
Du N, Dai H, Trivedi R et al. (2016) Recurrent marked temporal point processes: Embedding event history to vector. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1555–1564
Xiao S, Yan J, Yang X et al. (2018) Modeling the intensity function of point process via recurrent neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31, no 1
Mei H, Eisner JM (2017) The neural Hawkes process: a neurally self-modulating multivariate point process. Adv Neural Inform Process Syst 2:30
Zhang Q, Lipani A, Kirnap O et al. (2020) Self-attentive Hawkes process. In: International conference on machine learning. PMLR, pp 11183–11193
Zuo S, Jiang H, Li Z et al. (2020) Transformer Hawkes process. In: International conference on machine learning. PMLR, pp 11692–11702
Zhang L, Liu J, Song Z et al. (2021) Universal transformer Hawkes process. In: 2021 international joint conference on neural networks (IJCNN). IEEE, pp 1–7
Joseph S, Kashyap LD, Jain S (2020) Shallow Neural Hawkes: Non-parametric kernel estimation for Hawkes processes. arXiv preprint arXiv:2006.02460
Tolstikhin IO, Houlsby N, Kolesnikov A et al. (2021) Mlp-mixer: an all-mlp architecture for vision. In: Advances in neural information processing systems, pp 34
Melas-Kyriazi L (2021) Do you even need attention? a stack of feed-forward layers does surprisingly well on imagenet. arXiv preprint arXiv:2105.02723
Ding X, Xia C, Zhang X et al. (2021) Repmlp: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:2105.01883
Gallager RG (1996) Poisson processes. Discrete stochastic processes. Springer, Boston, pp 31–55
Pemantle R (2007) A survey of random processes with reinforcement. Probab Surv 4:1–79
Isham V, Westcott M (1979) A self-correcting point processes. Stochastic Process Appl 8(3):335–347
Zhou K, Zha H, Song L (2013) Learning triggering kernels for multi-dimensional Hawkes processes. In: International conference on machine learning. PMLR, pp 1301–1309
Malaviya J (2021) Survey on modeling intensity function of Hawkes process using neural models. arXiv preprint arXiv:2104.11092
Dehghani M, Gouws S, Vinyals O et al. (2018) Universal transformers. arXiv preprint arXiv:1807.03819
Dai Z, Yang Z, Yang Y et al. (2019) Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860
Guo MH, Liu ZN, Mu TJ et al. (2021) Beyond self-attention: external attention using two linear layers for visual tasks. arXiv preprint arXiv:2105.02358
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:889
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
He K, Zhang X, Ren S et al. (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ozaki T (1979) Maximum likelihood estimation of Hawkes’ self-exciting point processes. Ann Inst Stat Math 31(1):145–155
Xu H, Farajtabar M, Zha H (2016) Learning granger causality for Hawkes processes. In: International conference on machine learning. PMLR, pp 1717–1726
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Kingma D P, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Hildebrand FB (1987) Introduction to numerical analysis. Courier Corporation, London
Robert CP, Casella G, Casella G (1999) Monte Carlo statistical methods. Springer, New York
Johnson AEW, Pollard TJ, Shen L et al (2016) MIMIC-III, a freely accessible critical care database. Scientific data 3(1):1–9
Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled “Linear Normalization Attention Neural Hawkes Process.”
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Song, Zy., Liu, Jw., Yang, J. et al. Linear normalization attention neural Hawkes process. Neural Comput & Applic 35, 1025–1039 (2023). https://doi.org/10.1007/s00521-022-07821-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07821-1