Abstract
Responsive management of public transport nodes relies on constant monitoring of service quality. Social media content provides a unique opportunity to detect and monitor events impacting service quality in these nodes, as well as predicting future occurrences of such events. However, the confined geographic area of transport nodes exacerbates the sparsity of available feeds, raising two major challenges: limited observations—leading to biased models—and the asynchronous nature of observations—impeding the detection of causal patterns. Thus, this paper proposes a framework based on a multivariate Hawkes point process and sentiment analysis. The multivariate Hawkes point process allows effective modelling of events without making them discrete, hence it is less affected by data sparsity compared to time series models while enabling the prediction of how certain events can trigger future events. Besides, the extracted sentiments from social media feeds provide additional knowledge about passengers’ perception and thus, are used in our approach to strengthening the model. Experiments on a real-world dataset demonstrate the effectiveness of the model in identifying causal relations over the public transport nodes. They also show the efficacy of the proposed solution in predicting events over the limited context compared to state-of-the-art approaches.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The codes that support the findings of this study are available in https://github.com/mmrahimi/SentiHawkes. Due to Twitter terms of service, tweets cannot be directly shared. However, the tweet-IDs as well as their corresponding labels and sentiment scores are available in the provided repository.
Abbreviations
- \(\lambda (t)\) :
-
The event arrival rate with respect to time t, where \(\lambda \in \mathbb {R}^+\)
- \(\mathcal {S}(t)\) :
-
A set of events occurring up to time t
- \(\mu\) :
-
The background event arrival rate
- \(\phi (t)\) :
-
The effect of past events on the probability of a new event occurring at time t
- \(\tau\) :
-
The time interval between two event arrivals
- N(t):
-
The number of events that have happened up to time t
References
Acker B, Yuan M (2019) Network-based likelihood modeling of event occurrences in space and time: a case study of traffic accidents in Dallas, Texas, USA. Cartogr Geogr Inf Sci 46(1):21–38. https://doi.org/10.1080/15230406.2018.1515037
Bacry E, Delattre S, Hoffmann M, Muzy JF (2013) Modelling microstructure noise with mutually exciting point processes. Quant Finance 13(1):65–77. https://doi.org/10.1080/14697688.2011.647054
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Chiang WH, Yuan B, Li H et al. (2020) SOS-EW: System for overdose spike early warning using drug mover’s distance-based Hawkes processes. In: Cellier P, Driessens K (eds) Machine Learning and Knowledge Discovery in Databases. Springer, Cham, pp 538–554
Chou PF, Lu CS, Chang YH (2014) Effects of service quality and customer satisfaction on customer loyalty in high-speed rail services in Taiwan. Transportmetrica A: Transp Sci 10(10):917–945. https://doi.org/10.1080/23249935.2014.915247
Cox T, Houdmont J, Griffiths A (2006) Rail passenger crowding, stress, health and safety in Britain. Transp Res Part A: Policy Pract 40(3):244–258. https://doi.org/10.1016/j.tra.2005.07.001
Das S, Pandit D (2015) Determination of level-of-service scale values for quantitative bus transit service attributes based on user perception. Transportmetrica A: Transp Sci 11(1):1–21. https://doi.org/10.1080/23249935.2014.910563
Didelez V (2008) Graphical models for marked point processes based on local independence. J Royal Stat Soc: Series B (Statistical Methodology) 70(1):245–264. https://doi.org/10.1111/j.1467-9868.2007.00634.x
Du N, Dai H, Trivedi R, Upadhyay U, Gomez-Rodriguez M, Song L (2016) Recurrent marked temporal point processes: Embedding event history to vector. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, San Francisco, CA, USA, 1555–1564, https://doi.org/10.1145/2939672.2939875
Du WB, Zhang MY, Zhang Y, Cao XB, Zhang J (2018) Delay causality network in air transport systems. Transp Res Part E: Logist Transp Rev 118:466–476. https://doi.org/10.1016/j.tre.2018.08.014
Dutta HS, Dutta VR, Adhikary A, Chakraborty T (2020) HawkesEye: detecting fake retweeters using Hawkes process and topic modeling. IEEE Trans Inf Forens Secur 15:2667–2678. https://doi.org/10.1109/TIFS.2020.2970601
Eboli L, Mazzulla G (2015) Relationships between rail passengers’ satisfaction and service quality: a framework for identifying key service factors. Public Transp 7(2):185–201. https://doi.org/10.1007/s12469-014-0096-x
Edvardsson B (1998) Causes of customer dissatisfaction - studies of public transport by the critical-incident method. Manag Serv Quali: An Int J 8(3):189–197. https://doi.org/10.1108/09604529810215675
Efron B (1992) Bootstrap methods: another look at the Jackknife BT. In: Kotz S, Johnson NL (eds) Breakthroughs in Statistics, vol 2. Springer, New York, NY, pp 569–593. https://doi.org/10.1007/978-1-4612-4380-9_41
Eichler M, Dahlhaus R, Dueck J (2017) Graphical modeling for multivariate Hawkes processes with nonparametric link functions. J Time Ser Anal 38(2):225–242. https://doi.org/10.1111/jtsa.12213
Faes L, Nollo G, Stramaglia S, Marinazzo D (2017) Multiscale Granger causality. Phys Rev 96(4):042150. https://doi.org/10.1103/PhysRevE.96.042150
Gao Y, Rasouli S, Timmermans H, Wang Y (2017) Effects of traveller’s mood and personality on ratings of satisfaction with daily trip stages. Travel Behav Soc 7:1–11. https://doi.org/10.1016/J.TBS.2016.11.002
Ge L, Sarhani M, Voß S (2021) Review of transit data sources: potentials, challenges and complementarity. Sustainability 13(20):11450. https://doi.org/10.3390/su132011450
Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221. https://doi.org/10.1007/s10708-007-9111-y
Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424–438. https://doi.org/10.2307/1912791
Haghighi NN, Liu XC, Wei R, Li W, Shao H (2018) Using Twitter data for transit performance assessment: a framework for evaluating transit riders’ opinions about quality of service. Public Transp 10:363–377. https://doi.org/10.1007/s12469-018-0184-4
Hamilton JD (1994) Time Series Analysis. Princeton University Press, Princeton, NJ
Hasan M, Orgun MA, Schwitter R (2017) A survey on real-time event detection from the Twitter data stream. J Inf Sci 44(4):443–463. https://doi.org/10.1177/0165551517698564
Hawkes AG (1971) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1):83–90. https://doi.org/10.1093/biomet/58.1.83
Hawkes AG (2018) Hawkes processes and their applications to finance: a review. Quant Finance 18(2):193–198. https://doi.org/10.1080/14697688.2017.1403131
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu W, Jin PJ (2017) An adaptive Hawkes process formulation for estimating time-of-day zonal trip arrivals with location-based social networking check-in data. Transp Res Part C: Emerg Technol 79:136–155. https://doi.org/10.1016/j.trc.2017.02.002
Hu W, Yao Z, Yang S, Chen S, Jin PJ (2019) Discovering urban travel demands through dynamic zone correlation in location-based social networks. In: Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G (eds) Lecture Notes in Computer Science, vol 11052. Springer, Cham, 88–104, https://doi.org/10.1007/978-3-030-10928-8_6
Ikoro V, Sharmina M, Malik K, Batista-Navarro R (2018) Analyzing sentiments expressed on Twitter by UK energy company consumers. In: fifth international conference on social networks analysis, management and security (SNAMS) 95–98, https://doi.org/10.1109/SNAMS.2018.8554619
Jenelius E (2020) Personalized predictive public transport crowding information with automated data sources. Transp Res Part C: Emerg Technol 117:102:647. https://doi.org/10.1016/j.trc.2020.102647
Jiwattanakulpaisarn P, Noland RB, Graham DJ (2010) Causal linkages between highways and sector-level employment. Transp Res Part A: Policy Pract 44(4):265–280. https://doi.org/10.1016/j.tra.2010.01.008
Kalair K, Connaughton C, Alaimo Di Loro P (2021) A non-parametric Hawkes process model of primary and secondary accidents on a UK smart motorway. J Royal Stat Soc: Series C (Appl Stat) 70(1):80–97. https://doi.org/10.1111/rssc.12450
Kharde V, Sonawane S (2016) Sentiment analysis of Twitter data: A survey of techniques. Int J Comput Appl 139(11):5–15. https://doi.org/10.5120/ijca2016908625
Kim M, Paini D, Jurdak R (2019) Modeling stochastic processes in disease spread across a heterogeneous social system. Proc Natl Acad Sci 116(2):401–406. https://doi.org/10.1073/pnas.1801429116
Lazer DM, Baum MA, Benkler Y et al. (2018) The science of fake news: addressing fake news requires a multidisciplinary effort. Science 359(6380):1094–1096. https://doi.org/10.1126/science.aao2998
Li L, Su X, Wang Y, Lin Y, Li Z, Li Y (2015) Robust causal dependence mining in big data network and its application to traffic flow predictions. Transp Res Part C: Emerg Technol 58:292–307. https://doi.org/10.1016/j.trc.2015.03.003
Li S, Gao X, Bao W, Chen G (2017) FM-Hawkes: A Hawkes process based approach for modeling online activity correlations. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, CIKM ’17, p 1119–1128, https://doi.org/10.1145/3132847.3132883
Li S, Xiao S, Zhu S, Du N, Xie Y, Song L (2018a) Learning temporal point processes via reinforcement learning. In: Advances in neural information processing systems. Neural Inf Proc Syst Found 10,781–10,791
Li Z, Cui L, Chen J (2018) Traffic accident modelling via self-exciting point processes. Reliab Eng Syst Saf 180:312–320. https://doi.org/10.1016/j.ress.2018.07.035
Lock O, Pettit C (2020) Social media as passive geo-participation in transportation planning – how effective are topic modeling & sentiment analysis in comparison with citizen surveys? Geo-spatial Inf Sci 23(4):275–292. https://doi.org/10.1080/10095020.2020.1815596
Lozano AC, Abe N, Liu Y, Rosset S (2009) Grouped graphical Granger modeling for gene expression regulatory networks discovery. Bioinformatics 25(12):i110–i118. https://doi.org/10.1093/bioinformatics/btp199
Ma X, Sun S, Liu XC, Ding C, Chen Z, Wang Y (2018) A time-varying parameters vector auto-regression model to disentangle the time varying effects between drivers’ responses and tolling on high occupancy toll facilities. Transp Res Part C: Emerg Technol 88:208–226. https://doi.org/10.1016/j.trc.2018.01.025
Mei H, Eisner JM (2017) The neural Hawkes process: a neurally self-modulating multivariate point process. In: Advances in neural information processing systems (NIPS 2017). Curran Associates, Inc. 6754–6764
Mishra DN, Panda RK (2022) Decoding customer experiences in rail transport service: application of hybrid sentiment analysis. Public Transp. https://doi.org/10.1007/s12469-021-00289-7
Ni M, He Q, Gao J (2017) Forecasting the subway passenger flow under event occurrences with social media. IEEE Trans Intell Transp Syst 18(6):1623–1632. https://doi.org/10.1109/TITS.2016.2611644
Ogata Y (1981) On Lewis’ simulation method for point processes. IEEE Trans Inf Theory 27(1):23–31. https://doi.org/10.1109/TIT.1981.1056305
Omi T, Hirata Y, Aihara K (2017) Hawkes process model with a time-dependent background rate and its application to high-frequency financial data. Phys Rev E 96(1):012303. https://doi.org/10.1103/PhysRevE.96.012303
Pacheco RR, Fernandes E (2017) International air passenger traffic, trade openness and exchange rate in Brazil: A Granger causality test. Transp Res Part A: Policy Pract 101:22–29. https://doi.org/10.1016/j.tra.2017.04.026
Pan B, Demiryurek U, Shahabi C (2012) Utilizing real-world transportation data for accurate traffic prediction. In: IEEE 12th International Conference on Data Mining 595–604. https://doi.org/10.1109/ICDM.2012.52
Politis DN, Romano JP (1994) The stationary bootstrap. J Am Stat Assoc 89(428):1303–1313. https://doi.org/10.1080/01621459.1994.10476870
Qiu H, Liu Y, Subrahmanya NA, Li W (2012) Granger causality for time-series anomaly detection. In: IEEE 12th International Conference on Data Mining 1074–1079. https://doi.org/10.1109/ICDM.2012.73
Rahimi MM, Naghizade E, Winter S, Stevenson M (2019) The effectiveness of sentiment analysis for detecting fine-grained service quality. In: GeoComputation 2019. University of Auckland, Queenstown, New Zealand, https://doi.org/10.17608/k6.auckland.9848132.v2
Rahimi MM, Naghizade E, Stevenson M, Winter S (2020) Service quality monitoring in confined spaces through mining Twitter data. J Sp Inf Sci 21:229–261. https://doi.org/10.5311/JOSIS.2020.21.603
Reynaud-Bouret P, Schbath S (2010) Adaptive estimation for Hawkes processes; application to genome analysis. Ann Stat 38(5):2781–2822. https://doi.org/10.1214/10-AOS806
Rinker TW (2017) SentimentR: Calculate text polarity sentiment. http://github.com/trinker/sentimentr
Sahu PK, Sharma G, Guharoy A (2018) Commuter travel cost estimation at different levels of crowding in a suburban rail system: a case study of Mumbai. Public Transp 10(3):379–398. https://doi.org/10.1007/s12469-018-0190-6
Salehi F, Trouleau W, Grossglauser M, Thiran P (2019) Learning Hawkes processes from a handful of events. In: Advances in neural information processing systems 32. Curran Associates, Inc., pp 12715–12725
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group Lasso. J Comput Graph Stat 22(2):231–245. https://doi.org/10.1080/10618600.2012.681250
Soltanpour A, Mesbah M, Habibian M (2020) Customer satisfaction in urban rail: a study on transferability of structural equation models. Public Transp 12(1):123–146. https://doi.org/10.1007/s12469-019-00223-y
Swishchuk A, Zagst R, Zeller G (2021) Hawkes processes in insurance: Risk model, application to empirical data and optimal investment. Insurance Math Econom 101(Part A):107–124. https://doi.org/10.1016/j.insmatheco.2020.12.005
Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in Twitter events. J Am Soc Inf Sci Technol 62(2):406–418. https://doi.org/10.1002/asi.21462
Tian Y, Pan L (2015) Predicting short-term traffic flow by long short-term memory recurrent neural network. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp 153–158, https://doi.org/10.1109/SmartCity.2015.63
Weissman GE, Ungar LH, Harhay MO, Courtright KR, Halpern SD (2019) Construct validity of six sentiment analysis methods in the text of encounter notes of patients with critical illness. J Biomed Inform 89:114–121. https://doi.org/10.1016/j.jbi.2018.12.001
Wu X, Shi B, Dong Y, Huang C, Faust L, Chawla NV (2018) RESTFul: Resolution-aware forecasting of behavioral time series data. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, CIKM ’18, pp 1073–1082, https://doi.org/10.1145/3269206.3271794
Xu H, Farajtabar M, Zha H (2016) Learning Granger causality for Hawkes processes. In: Balcan MF, Weinberger KQ (eds) 33rd International Conference on Machine Learning, ICML 2016, Proceedings of Machine Learning Research, vol 4. PMLR, New York, USA, pp 2576–2588
Xu H, Zhang Y, Li H, Skitmore M, Yang J, Yu F (2019) Safety risks in rail stations: an interactive approach. J Rail Transp Plan Manag 11(100):148. https://doi.org/10.1016/j.jrtpm.2019.100148
Yao CZ, Lin QW, Lin JN (2016) A study of industrial electricity consumption based on partial Granger causality network. Phys A 461:629–646. https://doi.org/10.1016/j.physa.2016.06.072
Yao W, Qian S (2021) From Twitter to traffic predictor: Next-day morning traffic prediction using social media data. Transp Res Part C: Emerg Technol 124(102):938. https://doi.org/10.1016/j.trc.2020.102938
Yetkiner H, Beyzatlar MA (2020) The Granger-causality between wealth and transportation: A panel data approach. Transp Policy 97:19–25. https://doi.org/10.1016/j.tranpol.2020.07.004
Zander KK, Cadag JR, Escarcha J, Garnett ST (2018) Perceived heat stress increases with population density in urban Philippines. Environ Res Lett 13(8):084009. https://doi.org/10.1088/1748-9326/aad2e5
Zhang W, Panum TK, Jha S, Chalasani P, Page D (2020) CAUSE: Learning Granger causality from event sequences using attribution methods. In: Daumé III H, Singh A (eds) 37th International Conference on Machine Learning, ICML 2020, Proceedings of Machine Learning Research PMLR, vol 119, pp 11235–11245
Acknowledgements
The authors acknowledge assistance and advice from Amir Khodabandeh on statistical analysis.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rahimi, M.M., Naghizade, E., Stevenson, M. et al. SentiHawkes: a sentiment-aware Hawkes point process to model service quality of public transport using Twitter data. Public Transp 15, 343–376 (2023). https://doi.org/10.1007/s12469-022-00310-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12469-022-00310-7