Skip to main content

Advertisement

Log in

Health-related hypothesis generation using social media data

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Traditional public health surveillance, also known as syndromic surveillance, is expensive and burdensome because it relies on clinical reports authored by health professionals with considerable time and effort. Due to its preventative cost, syndromic surveillance is typically only performed for high risk concerns like influenza. Therefore, a health surveillance system that works for numerous health concerns simultaneously would be of great practical use. We present a framework that processes a stream of time-stamped social media messages. The framework produces “interest curves” that permit the generation of hypotheses regarding which health-related conditions/topics may be increasing in prevalence. We do not claim to detect an actual outbreak of a health-related condition because this framework only has access to social media messages and not a harder data source like patient records. This approach differs from other prior approaches because it is not customized to detect one particular illness (e.g., influenza) as is commonly done. The inner workings of the framework can be interpreted as a transformation that converts a signal deeply embedded in the “stream of raw tweets” domain to a signal in the “health related topics” domain. This framework’s capability is demonstrated by examining multiple interest curves related to seasonal influenza and allergies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB, pp 487–499

  • Allan J, Papka R, Lavrenko V (1998) On-line new event detection and tracking. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM

  • Aramaki E, Maskawa S, Morita M (2011) Twitter catches the flu: detecting influenza epidemics using twitter. In: Proceedings of the Conference on empirical methods in natural language processing, EMNLP, pp 1568–1576

  • Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. the. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Bowman D (2010) “Tweaking the Twitter homepage”, The offical twitter blog, posted 30 Mar 2010. https://blog.twitter.com/2010/tweaking-twitter-homepage

  • Box G, Jenkins G, Reinsel G (1970) Time series analysis: forecasting and control. John Wiley & Sons

  • Brown ST, Tai JH, Bailey RR, Cooley PC, Wheaton WD, Potter MA, Voorhees RE, Lejeune M, Grefenstette JJ, Burke DS, McGlone SM, Lee BY (2011) Would school closure for the 2009 H1N1 influenza epidemic have been worth the cost?: a computational simulation of Pennsylvania. BMC Public Health 11(1):353

    Article  Google Scholar 

  • Burger EW, Federoff H, Frieder O, Goharian N, Yates A (2013) Social media communications networks and pharmacovigilance: SequelAE-2.0. In: Proceedings of the IEEE 15th international conference on e-health networking, applications and services, healthcom

  • Business Wire (2012) Twenty six percent of online adults discuss health information online; privacy cited as the biggest barrier to entry.http://www.businesswire.com/news/home/20121120005872/en

  • Chang J, Boyd-Graber JL, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: Proceedings of the 23rd annual conference on neural information processing systems, NIPS, pp 288–296

  • Chou W, Hunt Y, Beckjord E, Moser R, Hesse B (2009) Social media use in the United States: implications for health communication. J Med Internet Res, 11(4)

  • Corley C, Mikler A, Singh K, Cook D (2009) Monitoring influenza trends through mining social media. In Proceedings of the international conference on bioinformatics computational biology, ICBCB, pp 340–346

  • Culotta A (2010) Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the 1st workshop on social media analytics, pp 115–122

  • Diaz-Aviles E, Stewart A, Velasco E, Denecke K, Nejdl W (2012) Towards personalized learning to rank for epidemic intelligence based on social media streams. In: Proceedings of the 21st international conference companion on world wide web, WWW, pp 495–496

  • Epstein JM, Goedecke DM, Yu F, Morris RJ, Wagener DK et al (2007) Controlling pandemic flu: the value of international air travel restrictions. PLoS ONE 2(5):e401. doi:10.1371/journal.pone.0000401

    Article  Google Scholar 

  • FluTrends. http://www.google.org/flutrends/us/#US

  • Freifeld CC, Mandla KD, Reis BY, Brownstein JS (2008) Health map: global infectious disease monitoring through automated classification and visualization of internet media reports. J Am Med Inform Assoc

  • Ginsberg J, Mohebbi M, Patel R, Brammer L, Smolinski M, Brilliant L (2008) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014

    Article  Google Scholar 

  • Jamison-Powell S, Linehan C, Daley L, Garbett A, Lawson S (2012) I can’t get no sleep: discussing# insomnia on twitter. In: Proceedings of the ACM annual conference on human factors in computing systems, CHI, pp 1501–1510

  • Jansen B, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188

    Article  Google Scholar 

  • Kalman R, Bucy R (1961) New results in linear filtering and prediction theory. J Basic Eng 83(1):95–108

    Article  MathSciNet  Google Scholar 

  • Kleinberg J (2003) Bursty and hierarchical structure in streams. Data Min Knowl Disc 7(4):373–397

    Article  MathSciNet  Google Scholar 

  • Koike D, et al. (2013) Time series topic modeling and bursty topic detection of correlated news and twitter. In: Proc. 6th IJCNLP

  • Lampos V, Cristianini N (2012) Nowcasting events from the social web with statistical learning. ACM Trans Intell Syst Technol 3(4):72

    Article  Google Scholar 

  • Li H, Wang Y, Zhang D, Zhang M, Chang E (2008) PFP: parallel FP-growth for query recommendation. In: Proceedings of the ACM conference on recommender systems, pp 107–114

  • McIver DJ, Brownstein JS (2014) Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real-time. PLoS Comput Biol 10(4):e1003581

    Article  Google Scholar 

  • Mykhalovskiy E, Weir L et al (2006) The global public health intelligence network and early warning outbreak detection: a Canadian contribution to global public health. Can J Public Health 97(1):42

    Google Scholar 

  • Nakhasi A, Passarella R, Bell S, Paul M, Dredze M, Pronovost P (2012) Malpractice and malcontent: analyzing medical complaints in twitter. In: AAAI Fall Symposium Series

  • O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the 4th international conference on weblogs and social media, ICWSM

  • Page E (1954) Continuous inspection schemes. Biometrika 100–115

  • Parker A (2011) Twitter’s Secret Handshake. The New York Times. Retrieved 26 Jul 2011. http://www.nytimes.com/2011/06/12/fashion/hashtags-a-new-way-for-tweets-cultural-studies.html?_r=2&pagewanted=all&

  • Parker J, Epstein JM (2011) A distributed platform for global-scale agent-based models of disease transmission. ACM Trans Model Comput Simul. 22(1) Article 2, p 25

  • Parker J, Wei Y, Yates A, Frieder O, Goharian N (2013) A framework for detecting public health trends with twitter. In: Proceedings of the international conference on advances in social networks analysis and mining

  • Paul M, Dredze M (2012) A model for mining public health topics from twitter. HEALTH 11:16–26

    Google Scholar 

  • Paul MJ, Girju R (2010) A two-dimensional topic-spect model for discovering multi-faceted topics. In: Proceedings of the 24th AAAI conference on artificial intelligence

  • Roberts S (1959) Control chart tests based on geometric moving averages. Technometrics 1(3):239–250

    Article  Google Scholar 

  • Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web, WWW, pp 851–860

  • Shewhart W (1931) Economic control of quality of manufactured product. vol 509. ASQ Quality Press

  • SEC Amendment 1 to Form S-1 Registration Statement, Twitter,Inc. EDGAR. October 15, 2013. Retrieved 8 Nov 2013. http://www.sec.gov/Archives/edgar/data/1418091/000119312513400028/d564001ds1a.htm

  • Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: what 140 characters reveal about political sentiment. In: Proceedings of the 4th international conference on weblogs and social media, ICWSM

  • Twitter statistics. http://www.statisticbrain.com/twitter-statistics/

  • Twitter blogs: measuring tweets. http://blog.twitter.com/2010/02/measuring-tweets.html

  • Wenerstrom B, Kantardzic M, Arabmakki E, Hindi M (2012) Multi-tweet summarization for flu outbreak detection. In: AAAI Fall Symposium Series

  • Yates A, Goharian N (2013) ADR trace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. In: Proceedings of the 35th European conference on information retrieval (ECIR 2013)

  • Yates A, Goharian N, Frieder O (2014) Relevance-ranked domain-specific synonym discovery. In: Proceedings of the 36th European conference on information retrieval, ECIR

  • Zaki M (2000) Scalable algorithms for association mining. Knowl Data Eng IEEE Trans 12(3):372–390

    Article  MathSciNet  Google Scholar 

  • Zhu Y, Goharian N (2013) To follow or not to follow: a feature evaluation. In: Proceedings of the 22nd international conference on world wide web (WWW’13)

Download references

Acknowledgments

This work was partially supported by: the US National Science Foundation through Grant CNS-1204347, the Models of Infectious Disease Agent Study (MIDAS), under Award Number U01GM070708 from the NIGMS, The Johns Hopkins Medical School DHS Center on Preparedness and Catastrophic Event Response (PACER), under Award Number N00014-06-1-0991 from the Office of Naval Research, and Joshua M. Epstein’s NIH Director’s Pioneer Award, Number DP1OD003874 from the Office of the Director, National Institutes of Health. Finally, we would like to thank Social Network Analysis and Mining for the invitation to deepen our paper from the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jon Parker.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parker, J., Yates, A., Goharian, N. et al. Health-related hypothesis generation using social media data. Soc. Netw. Anal. Min. 5, 7 (2015). https://doi.org/10.1007/s13278-014-0239-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-014-0239-8

Keywords

Navigation