Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models

Chen, Liangzhe; Tozammel Hossain, K. S. M.; Butler, Patrick; Ramakrishnan, Naren; Prakash, B. Aditya

doi:10.1007/s10618-015-0434-x

Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models

Published: 08 September 2015

Volume 30, pages 681–710, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Liangzhe Chen ORCID: orcid.org/0000-0001-8189-7910¹,
K. S. M. Tozammel Hossain¹,
Patrick Butler¹,
Naren Ramakrishnan¹ &
…
B. Aditya Prakash¹

1433 Accesses
44 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for nowcasting the Flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemics in a population. There is a disconnect between data-driven methods for forecasting Flu incidence and epidemiological models that adopt a state based understanding of transitions, that can lead to sub-optimal predictions. Furthermore, models for epidemiological activity and social activity like on Twitter predict different shapes and have important differences. In this paper, we propose two temporal topic models (one unsupervised model as well as one improved weakly-supervised model) to capture hidden states of a user from his tweets and aggregate states in a geographical region for better estimation of trends. We show that our approaches help fill the gap between phenomenological methods for disease surveillance and epidemiological models. We validate our approaches by modeling the Flu using Twitter in multiple countries of South America. We demonstrate that our models can consistently outperform plain vocabulary assessment in Flu case-count predictions, and at the same time get better Flu-peak predictions than competitors. We also show that our fine-grained modeling can reconcile some contrasting behaviors between epidemiological and social models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regional Level Influenza Study with Geo-Tagged Twitter Data

Article 02 July 2016

Tracking Dengue Epidemics Using Twitter Content Classification and Topic Modelling

Modeling Flu Trends with Real-Time Geo-tagged Twitter Data Streams

Notes

Code and vocabulary can be found here: http://people.cs.vt.edu/liangzhe/code/hfstm-a.html.
http://datasift.com/.
http://www.basistech.com.
http://www.google.org/flutrends.

References

Achrekar H, Gandhe A, Lazarus R, Yu S-H, and Liu B (2011) Predicting flu trends using twitter data. In: 2011 IEEE conference on computer communications workshops (INFOCOM WKSHPS). pp 702–707
Anderson RM, May RM (1991) Infectious diseases of humans. Oxford University Press, Oxford
Google Scholar
Andrews M, Vigliocco G (2010) The hidden markov topic model: a probabilistic model of semantic representation. Top Cogn Sci 2(1):101–113
Article Google Scholar
Aramaki E, Maskawa S, Morita M (2011) Twitter catches the flu: detecting influenza epidemics using twitter. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP ’11). pp 1568–1576
Beretta E, Takeuchi Y (1995) Global stability of an SIR epidemic model with time delays. J Math Biol 33(3):250–260
Article MathSciNet MATH Google Scholar
Blasiak S, Rangwala H (2011) A hidden Markov model variant for sequence classification. In: The 21nd international joint conference on artificial intelligence. pp 1192–1197
Blei D, Carin L, Dunson D (2010) Probabilistic topic models. Signal Process Mag IEEE 27(6):55–65
Google Scholar
Blei D, Lafferty J (2006) Dynamic topic models. In: The 23rd international conference on machine learning. pp 113–120
Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Brennan SP, Sadilek A, Kautz HA (2013) Towards understanding global spread of disease from everyday interpersonal interactions. In: Proceedings of the 23rd international joint conference on artificial intelligence. AAAI Press, pp 2783–2789
Butler D (2013) When Google got Flu wrong. Nature 494(7436):155–156
Article Google Scholar
Chakraborty P, Khadivi P, Lewis B, Mahendiran A, Chen J, Butler P, Nsoesie E, Mekaru S, Brownstein J, Marathe M, Ramakrishnan N (2014) Forecasting a moving target: ensemble models for ili case count predictions. In: 2014 SIAM international conference on data mining (SDM ’14)
Chen L, Hossain KSMT, Butler P, Ramakrishnan N, Prakash BA (2014) Flu gone viral: Syndromic surveillance of flu on twitter using temporal topic models. In: Proceedings of the fifth IEEE international conference on data mining (ICDM ’14)
Christakis NA, Fowler JH (2010) Social network sensors for early detection of contagious outbreaks. PLoS One 5(9):e12948
Article Google Scholar
Crane R, Sornette D (2008) Robust dynamic classes revealed by measuring the response function of a social system. Proc Natl Acad Sci 105(41):15649–15653
Article Google Scholar
Culotta A (2010) Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the first workshop on social media analytics. ACM, pp 115–122
Ginsberg J, Mohebbi M, Patel R, Brammer L, Smolinski M, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014
Article Google Scholar
Glance N, Hurst M, Tomokiyo T (2004) Blogpulse: automated trend discovery for weblogs. WWW 2004 workshop on the weblogging ecosystem: aggregation, analysis and dynamics
Gruber A, Weiss Y, Rosen-Zvi M (2007) Hidden topic markov models. In: International conference on artificial intelligence and statistics. pp 163–170
Hethcote HW (2000) The mathematics of infectious diseases. Soc Ind Appl Math SIAM Rev 42(4):599–653
MathSciNet MATH Google Scholar
Hong L, Yin D, Guo J, Davison B (2011) Tracking trends: incorporating term volume into temporal topic models. In: the 17th ACM SIGKDD international conference on knowledge discovery and data mining. pp 484–492
Jacquez J, Simon C (1993) The stochastic SI model with recruitment and deaths I. Comparison with the closed SIS model. Math Biosci 117(1):77–125
Article MathSciNet MATH Google Scholar
Lamb A, Paul MJ, Dredze M (2013) Separating fact from fear: tracking flu infections on twitter. In: North American chapter of the association for computational linguistics (NAACL). pp 789–795
Lampos V, Cristianini N (2012) Nowcasting events from the social web with statistical learning. ACM Trans Intell Syst Technol 3(4):72
Article Google Scholar
Lampos V, De Bie T, Cristianini N (2010) Flu detector: tracking epidemics on twitter. In: Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases: Part III (ECML PKDD’10). pp 599–602
Lazer DM, Kennedy R, King G, Vespignani A (2014) The parable of google flu: traps in big data analysis. Science 343(6176):1203–1205
Article Google Scholar
Lee K, Agrawal A, Choudhary A (2013) Real-time disease surveillance using twitter data: demonstration on flu and cancer. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 1474–1477
Li J, Cardie C (2013) Early stage influenza detection from twitter. arXiv:1309.7340
Li M, Muldowney J (1995) Global stability for the seir model in epidemiology. Math Biosci 125(2):155–164
Article MathSciNet MATH Google Scholar
Matsubara Y, Sakurai Y, Prakash BA, Li L, Faloutsos C (2012) Rise and fall patterns of information diffusion: model and implications. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’12). pp 6–14
PAHO (2012). Epidemic disease database, pan american health organization. http://www.ais.paho.org/phip/viz/ed_flu.asp
Paul M, Dredze M (2011) You are what you tweet: analyzing twitter for public health. In: Fifth international AAAI conference on weblogs and social media (ICWSM 2011). pp 265–272
Paul M, Girju R (2010) A two-dimensional topic-aspect model for discovering multi-faceted topics. Urbana 51:61801
Google Scholar
Romero DM, Meeder B, Kleinberg J (2011) Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th international conference on world wide web (WWW ’11). ACM, New York. pp 695–704
Sadilek A, Kautz H, Silenzio V (2012) Predicting disease transmission from geo-tagged micro-blog data. In: AAAI conference on artificial intelligence
Spasojevic N, Yan J, Rao A, Bhattacharyya P (2014) Lasta: large scale topic assignment on multiple social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’14). ACM, New York. pp 1809–1818
Steyvers M, Smyth P, Rosen-Zvi M, Griffiths T (2004) Probabilistic author-topic models for information discovery. In: The 10th ACM SIGKDD international conference on knowledge discovery and data mining. pp 306–315
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’06). pp 424–433
Yang J, Leskovec J (2011) Patterns of temporal variation in online media. In: Proceedings of the fourth ACM international conference on web search and data mining. ACM. pp 177–186
Yang J, McAuley J, Leskovec J, LePendu P, Shah N (2014a) Finding progression stages in time-evolving event sequences. In: Proceedings of the 23rd international conference on world wide web (WWW ’14). pp 783–794
Yang S-H, Kolcz A, Schlaikjer A, Gupta P (2014b) Large-scale high-precision topic modeling on twitter. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’14). ACM, New York, pp 1907–1916
Zhao S, Zhong L, Wickramasuriya J, Vasudevan V (2011) Human as real-time sensors of social and physical events, A case study of twitter and sports games. arXiv:1106.4300

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1353346, by the Maryland Procurement Office under Contract H98230-14-C-0127, by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) Contract Number D12PC000337, and by the VT College of Engineering. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the respective funding agencies.

Author information

Authors and Affiliations

Department of Computer Science, Virginia Tech., 114 McBryde Hall (0106), Blacksburg, VA, 24061, USA
Liangzhe Chen, K. S. M. Tozammel Hossain, Patrick Butler, Naren Ramakrishnan & B. Aditya Prakash

Authors

Liangzhe Chen
View author publications
You can also search for this author in PubMed Google Scholar
K. S. M. Tozammel Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Butler
View author publications
You can also search for this author in PubMed Google Scholar
Naren Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
B. Aditya Prakash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liangzhe Chen.

Additional information

Responsible editor: Charu Aggarwal.

Appendix

1.1 HFSTM-A-FIT

In this appendix, we show the equations we designed for HFSTM-A-FIT. Note that the outlines of the HFSTM-FIT algorithm is similar to HFSTM-A-FIT, one can derive equations for HFSTM-FIT from the content we show below.

Let K, T, N, and U be the number of states, number of tweets per user, number of words per tweet, and total number of users. Let $O=<O_1,O_2,\ldots ,O_T>$ and $S=<S_1,S_2,\ldots ,S_T>$ the observed sequences of tweets and hidden states respectively for a particular user.

Here is a list of symbols that we will use.

1.
$\epsilon $: the prior for the binary state switching variable, which determines whether state of a tweet is drawn from the transition probability matrix or simply copied from the state of the previous tweet (a number in (0, 1])
2.
$\pi $: initial state probability (size is $1\times K$)
3.
$\eta $: tansition probability matrix (size is $K\times K$)
4.
$\phi $: word distrtibution for each state (size is $K\times W$, where W is the total number of keywords for all of the states)
5.
$w_{tn}$: the nth word in the tth tweet
6.
$\lambda $: the background switch variable
7.
c: the topic switch variable
8.
y: the observed aspect value

For HFSTM-A, as mentioned in Sect. 3.3, the value of $\lambda $ is biased by the observed aspect value y. We use $\lambda $ instead of $\lambda _{y}$ in the following for brevity, but remember the $\lambda $ value in the equations is actually calculated using:

$$\begin{aligned} \lambda _{y_i=0}&=\lambda \\ \lambda _{y_i=1}&=\lambda +b \,\times \,(1-\lambda )\\ c_{y_i=0}&=c-a \,\times \,c\\ c_{y_i=1}&=c+a \,\times \,(1-c) \end{aligned}$$

We want to learn all the parameters given the tweet sequence. For compact notation we use $H=(\epsilon ,\pi ,\eta ,\phi ,\lambda ,c)$. In HFSTM-A-FIT, we use forward backward procedure for which we define forward variable $A_t(i)$ and backward variable $B_t(i)$ as follows.

$$\begin{aligned} A_t(i)&= P(O_1,O_2,\ldots ,O_t, S_t =i|H)\\ B_t(i)&= P(O_{t+1},\ldots ,O_T|S_t =i,H) \end{aligned}$$

Let $\gamma _t(i)$ be the probability of being in state $S_i$ at for tth tweet given the observed tweet sequence O and other model parameters. For each user the size of $\gamma $ is $2K\times T$ (with the first K states as the states which are copies of the previous state, and the second K states which are derived after a transition). This probability can be expressed by the forward and backward probabilities.

$$\begin{aligned} \gamma _t(i)&= P(S_t =i|O,H)\\&= \frac{A_t(i)B_t(i)}{P(O|H)}\\&= \frac{A_t(i)B_t(i)}{\sum _{i=1}^{2K} A_t(i)B_t(i)}\\ \end{aligned}$$

We have two switch variables in the model: l, x. If $l=1$, the word is generated either by states or topics, if $l=0$ it’s generated by background. If $x=0$, the word is generated by topics, if $x=1$ it’s by states.

For $l_i=1$, which means that $w_i$ is generated by either state or topics.

$$\begin{aligned}&P(l_i=1| \lambda ,c,H,w) =\frac{ P(l_i=1|\lambda ,c,H)P(w|l_i=1,\lambda ,c,H)}{ P(w|\lambda ,c,H)}\\&= \textstyle \frac{ \lambda P(w_i|\lambda ,c,H,l_i=1,w_{-i})P(w_{-i}|\lambda ,c,H,l_i=1)}{ P(w_i|\lambda ,c,H,w_{-i})P(w_{-i}|\lambda ,c,H)}\\&= \textstyle \frac{ \lambda \sum _{x_i} [P(w_i|\lambda ,c,H,l_i=1,x_i,w_{-i})P(x_i|\lambda ,c,H,l_i=1,w_{-i})]}{ \sum _{l_i}[P(w_i|\lambda ,c,H,l_i,w_{-i})P(l_i|\lambda ,c,H,w_{-i})]}\\&= \textstyle \frac{ \lambda [(\sum _{topic} \phi _{topic} (w_i)P(topic|x_i=0,l_i=1,\lambda ,c,H,w_{-i}))(1-c)+(\sum _{state} \phi _{state}(w_i)\gamma _i(state))c]}{ \lambda [(\sum _{topic} \phi _{topic} (w_i)P(topic|...))(1-c)+(\sum _{state} \phi _{state}(w_i)\gamma _i(state))c]+(1-\lambda )\phi _{Bak}(w_i)} \end{aligned}$$

For $l_i=0$, $w_i$ is generated by background.

$$\begin{aligned}&P(l_i=0| \lambda ,c,H,w) =\textstyle \frac{ P(l_i=0|\lambda ,c,H)P(w|l_i=0,\lambda ,c,H)}{ P(w|\lambda ,c,H)}\\&= \textstyle \frac{ (1-\lambda )\phi _{Bak}(w_i)}{ \lambda [(\sum _{topic} \phi _{topic} (w_i)P(topic|...))(1-c)+(\sum _{state} \phi _{state}(w_i)\gamma _i(state))c]+(1-\lambda )\phi _{Bak}(w_i)} \end{aligned}$$

For $x_i=0$, $w_i$ is generated by topics.

$$\begin{aligned}&P(x_i=0| \lambda ,c,H,w) =\frac{ P(x_i=0|\lambda ,c,H)P(w|x_i=0,\lambda ,c,H)}{ P(w|\lambda ,c,H)}\\&= \frac{ (1-c) P(w_i|\lambda ,c,H,x_i=0,w_{-i})P(w_{-i}|\lambda ,c,H,x_i=0)}{ P(w_i|\lambda ,c,H,w_{-i})P(w_{-i}|\lambda ,c,H)}\\&= \frac{ (1-c) \sum _{l_i} [P(w_i|\lambda ,c,H,x_i=0,l_i,w_{-i})P(l_i|\lambda ,c,H,x_i=0,w_{-i})]}{ \sum _{x_i}P(w_i|\lambda ,c,H,w_{-i},x_i)P(x_i|\lambda ,c,H,w_{-i})}\\&=\textstyle \frac{ (1-c) [(\sum _{topic} \phi _{topic} (w_i)P(topic|x_i=0,l_i=1,\lambda ,c,H,w_{-i}))\lambda +\phi _{Bak}(w_i)(1-\lambda )]}{ (1-c) [(\sum _{top} \phi _{top} (w_i)P(top|...))\lambda +\phi _{Bak}(w_i)(1-\lambda )]+c[(\sum _{sta}\phi _{sta}(w_i)\gamma _i(sta))\lambda +\phi _{Bak}(w_i)(1-\lambda )]} \end{aligned}$$

For $x_i=1$, $w_i$ is generated by states.

$$\begin{aligned}&P(x_i=1| \lambda ,c,H,w) =\frac{ P(x_i=1|\lambda ,c,H)P(w|x_i=1,\lambda ,c,H)}{ P(w|\lambda ,c,H)}\\&= \textstyle \frac{ c[(\sum _{sta}\phi _{sta}(w_i)\gamma _i(sta))\lambda +\phi _{Bak}(w_i)(1-\lambda )]}{ (1-c) [(\sum _{top} \phi _{top} (w_i)P(top|...))\lambda +\phi _{Bak}(w_i)(1-\lambda )]+c[(\sum _{sta}\phi _{sta}(w_i)\gamma _i(sta))\lambda +\phi _{Bak}(w_i)(1-\lambda )]} \end{aligned}$$

Forward variable: We now further expand the forward variable in more details. The Initialization is as follows:

For $1\le i\le K$:

$$\begin{aligned} A_1(i)&=P(O_1,S_1=i|H)\\&=P(O_1|S_1=i,H)P((S_1=i|H)\\&=\pi _i\prod _{n=1}^NP(w_{1n}|S_1=i,H)\\&=\pi _i\prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{1n})+\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{1n})P(top|\ldots )+c\phi ^i(w_{1n})\right] \right\} \end{aligned}$$

For $K+1\le i\le 2K$: $A_1(i)=0$

Induction is as follows:

For $1\le j\le K$:

$$\begin{aligned} A_t(j)&=P(O_1,O_2,\ldots ,O_t,S_t=j|H)\\&=\left( \sum _i^{2K}A_{t-1}(i)\epsilon \eta _{ij}\right) P(O_t|S_t=j,H)\\&= \left( \sum _i^{2K}A_{t-1}(i)\epsilon \eta _{ij}\right) \prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{1n})\right. \\&\left. \quad +\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{1n})P(top|\ldots )+c\phi ^j(w_{1n})\right] \right\} \end{aligned}$$

For $K+1\le j\le 2K$:

$$\begin{aligned}&A_t(j)=P(O_1,O_2,\ldots ,O_t,S_t=j|H)\\&=(A_{t-1}(j)+A_{t-1}(j-K))(1-\epsilon )\prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{1n})\right. \\&\left. \quad +\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{1n})P(top|\ldots )+c\phi ^j(w_{1n})\right] \right\} \end{aligned}$$

Backward variable: The initialization for backward variable is as follows:

For $1\le i\le 2K$:

$$\begin{aligned} B_T(i)=1 \end{aligned}$$

Induction is as follows:

For $1\le i\le K$:

$$\begin{aligned}&B_t(i)=P(O_{t+1},\ldots ,O_T|S_t=i,H)\\&= \left( \sum _j^{K} \epsilon \eta _{ij} P(O_{t+1}| S_{t+1} =j,H) B_{t+1}(j)\right) \\&\quad + (1-\epsilon )P(O_{t+1}| S_{t+1} =i+K,H) B_{t+1}(i+K)\\&=\left( \sum _j^{K} \epsilon \eta _{ij} \prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{(t+1)n})+\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{(t+1)n})P(top|\ldots )\right. \right. \right. \\&\left. \left. \left. \quad +\,c\phi ^j(w_{(t+1)n})\right] \right\} B_{t+1}(j)\right) + (1-\epsilon )\prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{(t+1)n}) \right. \\&\left. \quad +\,\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{(t+1)n})P(top|\ldots )+c\phi ^i(w_{(t+1)n})\right] \right\} B_{t+1}(i+K)\\ \end{aligned}$$

For $K+1 \le i \le 2K$:

$$\begin{aligned}&B_{t}(i) = P(O_{t+1},\ldots ,O_T|S_t =i,H)\\&\quad = \left( \sum _j^{K} \epsilon \eta _{ij} P(O_{t+1}| S_{t+1} =j,H) B_{t+1}(j)\right) \\&\qquad + \,(1-\epsilon )P(O_{t+1}| S_{t+1} =i,H) B_{t+1}(i)\\&\begin{aligned}&\quad = \left( \sum _j^{K} \epsilon \eta _{ij} \prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{(t+1)n})+\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{(t+1)n})P(top|\ldots )\right. \right. \right. \\&\left. \left. \left. \qquad +\,c\phi ^j(w_{(t+1)n})\right] \right\} B_{t+1}(j) \right) + (1-\epsilon )\prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{(t+1)n})\right. \end{aligned} \\&\quad \quad \left. \left. \left. +\,\lambda [(1-c)\sum _{top}\phi _{top}(w_{(t+1)n})P(top|\ldots )+c\phi ^{i-K}(w_{(t+1)n}\right) \right] \right\} B_{t+1}(i)\\ \end{aligned}$$

Define z as follows:

$$\begin{aligned} z_{t,n}(i)&=P(T_{tn}=i|l_{tn}=1,x_{tn}=0,w_{tn},H)\\&=\frac{ P(w_{tn}|T_{tn}=i,H,l_{tn}=1,x_{tn}=0)P(T_{tn}=i|l_{tn}=1,x_{tn}=0,H)}{ P(w_{tn}|l_{tn}=1,x_{tn}=0,H)}\\&=\frac{ \phi _{top=i}(w_{tn})P(T_{tn}=i|l_{tn}=1,x_{tn}=0,H)}{ \sum _{i}[\phi _{top=i}(w_{tn})P(T_{tn}=i|l_{tn}=1,x_{tn}=0,H)]} \end{aligned}$$

Let $\xi _t(i,j)$ be the probability of being in state $S_i$ at time t, and state $S_j$ at time $t+1$, given O and other model parameters.

$$\begin{aligned} \xi _t(i,j)&=P(S_t=i,S_{t+1}=j|O,H)\\&=\frac{P(S_t=i,S_{t+1}=j,O|H)}{P(O|H)} \end{aligned}$$

To express $\xi _t(i,j)$, we have the following definition.

For $1\le i\le 2K$ and $1\le j\le K$:

$$\begin{aligned}&T_1=A_t(i)\epsilon \eta _{ij}P(O_{t+1}|S_{t+1}=j,H)B_{t+1}(j)\\&=A_t(i)\epsilon \eta _{ij}\prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{(t+1)n})\right. \\&\left. \quad +\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{(t+1)n})P(top|\ldots )+c\phi ^j(w_{(t+1)n})\right] \right\} B_{t+1}(j) \end{aligned}$$

For $1\le i\le K$ and $K+1\le j\le 2K$:

$$\begin{aligned}&T_2=A_t(i)(1-\epsilon )P(O_{t+1}|S_{t+1}=i+K,H)B_{t+1}(i+K)\\&=A_t(i)(1-\epsilon )\prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{(t+1)n})\right. \\&\left. \quad +\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{(t+1)n})P(top|\ldots )+c\phi ^i(w_{(t+1)n})\right] \right\} B_{t+1}(i+K) \end{aligned}$$

For $K+1\le i\le 2K$ and $K+1\le j\le 2K$:

$$\begin{aligned}&T_3=A_t(i)(1-\epsilon )P(O_{t+1}|S_{t+1}=i,H)B_{t+1}(i)\\&=A_t(i)(1-\epsilon )\prod _{n=1}^N\left\{ (1-\lambda )\phi _{Bak}(w_{(t+1)n})\right. \\&\left. \quad +\lambda \left[ (1-c)\sum _{top}\phi _{top}(w_{(t+1)n})P(top|\ldots )+c\phi ^{i-K}(w_{(t+1)n})\right] \right\} B_{t+1}(i) \end{aligned}$$

Correspondingly, we have the following $\xi $ values according to the different i, j value range:

$$\begin{aligned} \xi _t(i,j)&= \frac{ T_1 }{ \sum _i \sum _j (T_1 + T_2 + T_3) }\\ \xi _t(i,j)&= \frac{ T_2 }{ \sum _i \sum _j (T_1 + T_2 + T_3) }\\ \xi _t(i,j)&= \frac{ T_3 }{ \sum _i \sum _j (T_1 + T_2 + T_3) } \end{aligned}$$

Estimation of parameters:

We use the following equations to estimate the parameter values in the M-step.

For estimating $\epsilon $:

$$\begin{aligned} \epsilon&= \frac{ \sum _{u=1}^U \sum _{t=1}^T \sum _{i=1}^{2K} \sum _{j=1}^K \xi (i,j) }{ \sum _{u=1}^U \sum _{t=1}^T \sum _{i=1}^{2K} \sum _{j=1}^{2K} \xi (i,j) } \end{aligned}$$

For estimating $\pi $:

$$\begin{aligned} \pi _i&= \frac{ \sum _{u=1}^U \gamma _1(i) }{ \sum _{u=1}^U \sum _{i=1}^K \gamma _1(i) } \quad \text {for}\, 1 \le \,\hbox {i}\, \le \,\hbox {K} \end{aligned}$$

For estimating $\eta $:

$$\begin{aligned} \eta _{ij}&= \frac{ \sum _{u=1}^U \sum _{t=1}^T \left( \xi _t(i,j) + \xi _t(i+K,j)\right) }{ \sum _{u=1}^U \sum _{t=1}^T \sum _{j=1}^K \left( \xi _t(i,j) + \xi _t(i+K,j)\right) } \quad \text {for}\, 1\le \,\hbox {i} \,\le \,\hbox {K}, \,1\le \,\hbox {j}\,\le \,\hbox {K} \end{aligned}$$

For estimating $\lambda $:

$$\begin{aligned} \lambda =\frac{\sum _u\sum _t \frac{1}{N_t}\sum _{n=1}^{N_t}P(l_{tn}=1|\lambda ,c,H,w)}{UT} \end{aligned}$$

For estimating c:

$$\begin{aligned} c =\frac{\sum _u\sum _t \frac{1}{N_t}\sum _{n=1}^{N_t}P(l_{tn}=1|\lambda ,c,H,w)P(x_{tn}=1|\lambda ,c,H,w)}{\sum _u\sum _t \frac{1}{N_t}\sum _{n=1}^{N_t}P(l_{tn}=1|\lambda ,c,H,w)} \end{aligned}$$

For estimating $\phi $:

$$ \begin{aligned} \phi ^i(w)&=\textstyle \frac{\sum _{u=1}^U \sum _{t=1}^T \sum _{\begin{array}{c} 1 \le n \le N \\ \& \\ w=w_{tn} \end{array}} P(l_{tn}=1|\lambda ,c,H,O)P(x_{tn}=1|\lambda ,c,H,O)\left( \gamma _t(i) + \gamma _t(i+K)\right) }{\sum _{u=1}^U \sum _{t=1}^T \sum _{w=1}^W \sum _{\begin{array}{c} 1 \le n \le N \\ \& \\ w=w_{tn} \end{array}} P(l_{tn}=1|\lambda ,c,H,O)P(x_{tn}=1|\lambda ,c,H,O)\left( \gamma _t(i) + \gamma _t(i+K)\right) } \\&\text {for}\, 1\le \,\hbox {i}\, \le \,\hbox {K}\\ \phi _{Bak}(w)&= \frac{\sum _{u=1}^U \sum _{t=1}^T \sum _{\begin{array}{c} 1 \le n \le N \\ \& \\ w=w_{tn} \end{array}} P(l_{tn}=0|\lambda ,c,H,O) }{\sum _{u=1}^U \sum _{t=1}^T \sum _{w=1}^W \sum _{\begin{array}{c} 1 \le n \le N \\ \& \\ w=w_{tn} \end{array}} P(l_{tn}=0|\lambda ,c,H,O) }\\ \phi _{Topic}(w)&= \textstyle \frac{ \sum _{u=1}^U \sum _{t=1}^T \sum _{\begin{array}{c} 1 \le n \le N \\ \& \\ w=w_{tn} \end{array}} P(l_{tn}=1|\lambda ,c,H,O)P(x_{tn}=0|\lambda ,c,H,O)z_{t,n}(Topic) }{ \sum _{u=1}^U \sum _{t=1}^T \sum _{w=1}^W \sum _{\begin{array}{c} 1 \le n \le N \\ \& \\ w=w_{tn} \end{array}} P(l_{tn}=1|\lambda ,c,H,O)P(x_{tn}=0|\lambda ,c,H,O)z_{t,n}(Topic) }\\ P(T_{tn}=i|l_{tn}&=1,x_{tn}=0,H)=\textstyle \frac{ \sum _{u=1}^U\sum _{t=1}^T\sum _{n=1}^{N_t}P(l_{tn}=1|\lambda ,c,H,O)P(x_{tn}=0|\lambda ,c,H,O)z_{t,n}(i)}{ \sum _{u=1}^U\sum _{t=1}^T\sum _{n=1}^{N_t}P(l_{tn}=1|\lambda ,c,H,O)P(x_{tn}=0|\lambda ,c,H,O)} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Tozammel Hossain, K.S.M., Butler, P. et al. Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models. Data Min Knowl Disc 30, 681–710 (2016). https://doi.org/10.1007/s10618-015-0434-x

Download citation

Received: 05 February 2015
Accepted: 18 August 2015
Published: 08 September 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10618-015-0434-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models

Abstract

Access this article

Similar content being viewed by others

Regional Level Influenza Study with Geo-Tagged Twitter Data

Tracking Dengue Epidemics Using Twitter Content Classification and Topic Modelling

Modeling Flu Trends with Real-Time Geo-tagged Twitter Data Streams

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 HFSTM-A-FIT

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models

Abstract

Access this article

Similar content being viewed by others

Regional Level Influenza Study with Geo-Tagged Twitter Data

Tracking Dengue Epidemics Using Twitter Content Classification and Topic Modelling

Modeling Flu Trends with Real-Time Geo-tagged Twitter Data Streams

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 HFSTM-A-FIT

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation