Abstract
Traditional public health surveillance, also known as syndromic surveillance, is expensive and burdensome because it relies on clinical reports authored by health professionals with considerable time and effort. Due to its preventative cost, syndromic surveillance is typically only performed for high risk concerns like influenza. Therefore, a health surveillance system that works for numerous health concerns simultaneously would be of great practical use. We present a framework that processes a stream of time-stamped social media messages. The framework produces “interest curves” that permit the generation of hypotheses regarding which health-related conditions/topics may be increasing in prevalence. We do not claim to detect an actual outbreak of a health-related condition because this framework only has access to social media messages and not a harder data source like patient records. This approach differs from other prior approaches because it is not customized to detect one particular illness (e.g., influenza) as is commonly done. The inner workings of the framework can be interpreted as a transformation that converts a signal deeply embedded in the “stream of raw tweets” domain to a signal in the “health related topics” domain. This framework’s capability is demonstrated by examining multiple interest curves related to seasonal influenza and allergies.









Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB, pp 487–499
Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM
Aramaki E, Maskawa S, Morita M (2011) Twitter catches the flu: detecting influenza epidemics using twitter. In: Proceedings of the Conference on empirical methods in natural language processing, EMNLP, pp 1568–1576
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. the. J Mach Learn Res 3:993–1022
Bowman D (2010) “Tweaking the Twitter homepage”, The offical twitter blog, posted 30 Mar 2010. https://blog.twitter.com/2010/tweaking-twitter-homepage
Box G, Jenkins G, Reinsel G (1970) Time series analysis: forecasting and control. John Wiley & Sons
Brown ST, Tai JH, Bailey RR, Cooley PC, Wheaton WD, Potter MA, Voorhees RE, Lejeune M, Grefenstette JJ, Burke DS, McGlone SM, Lee BY (2011) Would school closure for the 2009 H1N1 influenza epidemic have been worth the cost?: a computational simulation of Pennsylvania. BMC Public Health 11(1):353
Burger EW, Federoff H, Frieder O, Goharian N, Yates A (2013) Social media communications networks and pharmacovigilance: SequelAE-2.0. In: Proceedings of the IEEE 15th international conference on e-health networking, applications and services, healthcom
Business Wire (2012) Twenty six percent of online adults discuss health information online; privacy cited as the biggest barrier to entry.http://www.businesswire.com/news/home/20121120005872/en
Chang J, Boyd-Graber JL, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: Proceedings of the 23rd annual conference on neural information processing systems, NIPS, pp 288–296
Chou W, Hunt Y, Beckjord E, Moser R, Hesse B (2009) Social media use in the United States: implications for health communication. J Med Internet Res, 11(4)
Corley C, Mikler A, Singh K, Cook D (2009) Monitoring influenza trends through mining social media. In Proceedings of the international conference on bioinformatics computational biology, ICBCB, pp 340–346
Culotta A (2010) Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the 1st workshop on social media analytics, pp 115–122
Diaz-Aviles E, Stewart A, Velasco E, Denecke K, Nejdl W (2012) Towards personalized learning to rank for epidemic intelligence based on social media streams. In: Proceedings of the 21st international conference companion on world wide web, WWW, pp 495–496
Epstein JM, Goedecke DM, Yu F, Morris RJ, Wagener DK et al (2007) Controlling pandemic flu: the value of international air travel restrictions. PLoS ONE 2(5):e401. doi:10.1371/journal.pone.0000401
FluTrends. http://www.google.org/flutrends/us/#US
Freifeld CC, Mandla KD, Reis BY, Brownstein JS (2008) Health map: global infectious disease monitoring through automated classification and visualization of internet media reports. J Am Med Inform Assoc
Ginsberg J, Mohebbi M, Patel R, Brammer L, Smolinski M, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014
Jamison-Powell S, Linehan C, Daley L, Garbett A, Lawson S (2012) I can’t get no sleep: discussing# insomnia on twitter. In: Proceedings of the ACM annual conference on human factors in computing systems, CHI, pp 1501–1510
Jansen B, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188
Kalman R, Bucy R (1961) New results in linear filtering and prediction theory. J Basic Eng 83(1):95–108
Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Disc 7(4):373–397
Koike D, et al. (2013) Time series topic modeling and bursty topic detection of correlated news and twitter. In: Proc. 6th IJCNLP
Lampos V, Cristianini N (2012) Nowcasting events from the social web with statistical learning. ACM Trans Intell Syst Technol 3(4):72
Li H, Wang Y, Zhang D, Zhang M, Chang E (2008) PFP: parallel FP-growth for query recommendation. In: Proceedings of the ACM conference on recommender systems, pp 107–114
McIver DJ, Brownstein JS (2014) Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLoS Comput Biol 10(4):e1003581
Mykhalovskiy E, Weir L et al (2006) The global public health intelligence network and early warning outbreak detection: a Canadian contribution to global public health. Can J Public Health 97(1):42
Nakhasi A, Passarella R, Bell S, Paul M, Dredze M, Pronovost P (2012) Malpractice and malcontent: analyzing medical complaints in twitter. In: AAAI Fall Symposium Series
O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the 4th international conference on weblogs and social media, ICWSM
Page E (1954) Continuous inspection schemes. Biometrika 100–115
Parker A (2011) Twitter’s Secret Handshake. The New York Times. Retrieved 26 Jul 2011. http://www.nytimes.com/2011/06/12/fashion/hashtags-a-new-way-for-tweets-cultural-studies.html?_r=2&pagewanted=all&
Parker J, Epstein JM (2011) A distributed platform for global-scale agent-based models of disease transmission. ACM Trans Model Comput Simul. 22(1) Article 2, p 25
Parker J, Wei Y, Yates A, Frieder O, Goharian N (2013) A framework for detecting public health trends with twitter. In: Proceedings of the international conference on advances in social networks analysis and mining
Paul M, Dredze M (2012) A model for mining public health topics from twitter. HEALTH 11:16–26
Paul MJ, Girju R (2010) A two-dimensional topic-spect model for discovering multi-faceted topics. In: Proceedings of the 24th AAAI conference on artificial intelligence
Roberts S (1959) Control chart tests based on geometric moving averages. Technometrics 1(3):239–250
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web, WWW, pp 851–860
Shewhart W (1931) Economic control of quality of manufactured product. vol 509. ASQ Quality Press
SEC Amendment 1 to Form S-1 Registration Statement, Twitter,Inc. EDGAR. October 15, 2013. Retrieved 8 Nov 2013. http://www.sec.gov/Archives/edgar/data/1418091/000119312513400028/d564001ds1a.htm
Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: what 140 characters reveal about political sentiment. In: Proceedings of the 4th international conference on weblogs and social media, ICWSM
Twitter statistics. http://www.statisticbrain.com/twitter-statistics/
Twitter blogs: measuring tweets. http://blog.twitter.com/2010/02/measuring-tweets.html
Wenerstrom B, Kantardzic M, Arabmakki E, Hindi M (2012) Multi-tweet summarization for flu outbreak detection. In: AAAI Fall Symposium Series
Yates A, Goharian N (2013) ADR trace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. In: Proceedings of the 35th European conference on information retrieval (ECIR 2013)
Yates A, Goharian N, Frieder O (2014) Relevance-ranked domain-specific synonym discovery. In: Proceedings of the 36th European conference on information retrieval, ECIR
Zaki M (2000) Scalable algorithms for association mining. Knowl Data Eng IEEE Trans 12(3):372–390
Zhu Y, Goharian N (2013) To follow or not to follow: a feature evaluation. In: Proceedings of the 22nd international conference on world wide web (WWW’13)
Acknowledgments
This work was partially supported by: the US National Science Foundation through Grant CNS-1204347, the Models of Infectious Disease Agent Study (MIDAS), under Award Number U01GM070708 from the NIGMS, The Johns Hopkins Medical School DHS Center on Preparedness and Catastrophic Event Response (PACER), under Award Number N00014-06-1-0991 from the Office of Naval Research, and Joshua M. Epstein’s NIH Director’s Pioneer Award, Number DP1OD003874 from the Office of the Director, National Institutes of Health. Finally, we would like to thank Social Network Analysis and Mining for the invitation to deepen our paper from the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Parker, J., Yates, A., Goharian, N. et al. Health-related hypothesis generation using social media data. Soc. Netw. Anal. Min. 5, 7 (2015). https://doi.org/10.1007/s13278-014-0239-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-014-0239-8