Skip to main content

Constructing Accurate Confidence Intervals When Aggregating Social Media Data for Public Health Monitoring

  • Chapter
  • First Online:
Precision Health and Medicine (W3PHAI 2019)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 843))

Included in the following conference series:

Abstract

Social media data are widely used to infer health related information (e.g., the number of individuals with symptoms). A typical approach is to use a machine learning classification to aggregate and count the information of interest. However, this approach fails to account for errors made by the classifier. This paper summarizes data mining concepts that account for classifier error when counting data instances, and then extends these ideas to propose a new algorithm for constructing confidence intervals of social media estimates that we show to be substantially more accurate than standard approaches on two influenza-related Twitter datasets.

LA-UR-18-24425.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bella, A., Ferri, C., Hernandez-Orallo, J., Ramirez-Quintana, M.J.: Quantification via probability estimators. In: ICDM (2010). https://doi.org/10.1109/ICDM.2010.75

  2. Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the 1st Workshop on Social Media Analytics, Washington D.C, pp. 115–122 (2010)

    Google Scholar 

  3. Da San Martino, G., Gao, W., Sebastiani, F.: Ordinal text quantification. In: SIGIR (2016). https://doi.org/10.1145/2911451.2914749

  4. Doan, S., Ohno-Machado, L., Collier, N.: Enhancing Twitter data analysis with simple semantic filtering: example in tracking influenza-like illnesses (2012)

    Google Scholar 

  5. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, Boca Raton (1993)

    Google Scholar 

  6. Forman, G.: Counting positives accurately despite inaccurate classification. In: ECML (2005)

    Google Scholar 

  7. Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008). https://doi.org/10.1007/s10618-008-0097-y

    Article  MathSciNet  Google Scholar 

  8. Gao, W., Sebastiani, F.: Tweet sentiment: from classification to quantification. In: ASONAM (2015). https://doi.org/10.1145/2808797.2809327

  9. Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. SNAM 6(1), 19 (2016). https://doi.org/10.1007/s13278-016-0327-z

    Article  Google Scholar 

  10. Gonz´alez, P., Casta˜no, A., Chawla, N.V., Coz, J.J.D.: A review on quantification learning. ACM Comput. Surv. 50(5), 74:1–74:40 (2017). https://doi.org/10.1145/3117807

    Article  Google Scholar 

  11. Huang, X., Michael, C., Smith, M.J.P., Ryzhkov, D., Quinn, S.C., Broniatowski, D.A., Dredze, M.: Examining patterns of influenza vaccination in social media. In: AAAI Joint Workshop on Health Intelligence (2017)

    Google Scholar 

  12. Ji, X., Chun, S.A., Geller, J.: Monitoring public health concerns using twitter sentiment classifications. In: IEEE International Conference on Healthcare Informatics (2013). https://doi.org/10.1109/ICHI.2013.47

  13. Lamb, A., Paul, M.J., Dredze, M.: Separating fact from fear: tracking flu infections on Twitter. In: NAACL (2013)

    Google Scholar 

  14. Mitra, T., Counts, S., Pennebaker, J.: Understanding anti-vaccination attitudes in social media. In: ICWSM (2016)

    Google Scholar 

  15. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 Task4: sentiment analysis in Twitter. In: Proceedings of SemEval-2016 (2016)

    Google Scholar 

  16. Paul, M.J., Dredze, M.: Social monitoring for public health. In: Synthesis Lectures on Information Concepts, Retrieval, and Services, pp. 1–185. Morgan & Claypool (2017)

    Google Scholar 

  17. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)

    Google Scholar 

  18. P´erez-G´allego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017). https://doi.org/10.1016/j.inffus.2016.07.001

    Article  Google Scholar 

  19. Sebastiani, F.: Sentiment quantification of user-generated content. In: ESNAM (2018)

    Google Scholar 

  20. Xue, J.C., Weiss, G.M.: Quantification and semi-supervised classification methods for handling changes in class distribution. In: KDD (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashlynn R. Daughton .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Daughton, A.R., Paul, M.J. (2020). Constructing Accurate Confidence Intervals When Aggregating Social Media Data for Public Health Monitoring. In: Shaban-Nejad, A., Michalowski, M. (eds) Precision Health and Medicine. W3PHAI 2019. Studies in Computational Intelligence, vol 843. Springer, Cham. https://doi.org/10.1007/978-3-030-24409-5_2

Download citation

Publish with us

Policies and ethics