Constructing Accurate Confidence Intervals When Aggregating Social Media Data for Public Health Monitoring

Daughton, Ashlynn R.; Paul, Michael J.

doi:10.1007/978-3-030-24409-5_2

Ashlynn R. Daughton^4,5 &
Michael J. Paul⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 843))

Included in the following conference series:

International Workshop on Health Intelligence

824 Accesses
3 Citations

Abstract

Social media data are widely used to infer health related information (e.g., the number of individuals with symptoms). A typical approach is to use a machine learning classification to aggregate and count the information of interest. However, this approach fails to account for errors made by the classifier. This paper summarizes data mining concepts that account for classifier error when counting data instances, and then extends these ideas to propose a new algorithm for constructing confidence intervals of social media estimates that we show to be substantially more accurate than standard approaches on two influenza-related Twitter datasets.

LA-UR-18-24425.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bella, A., Ferri, C., Hernandez-Orallo, J., Ramirez-Quintana, M.J.: Quantification via probability estimators. In: ICDM (2010). https://doi.org/10.1109/ICDM.2010.75
Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the 1st Workshop on Social Media Analytics, Washington D.C, pp. 115–122 (2010)
Google Scholar
Da San Martino, G., Gao, W., Sebastiani, F.: Ordinal text quantification. In: SIGIR (2016). https://doi.org/10.1145/2911451.2914749
Doan, S., Ohno-Machado, L., Collier, N.: Enhancing Twitter data analysis with simple semantic filtering: example in tracking influenza-like illnesses (2012)
Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, Boca Raton (1993)
Google Scholar
Forman, G.: Counting positives accurately despite inaccurate classification. In: ECML (2005)
Google Scholar
Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008). https://doi.org/10.1007/s10618-008-0097-y
Article MathSciNet Google Scholar
Gao, W., Sebastiani, F.: Tweet sentiment: from classification to quantification. In: ASONAM (2015). https://doi.org/10.1145/2808797.2809327
Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. SNAM 6(1), 19 (2016). https://doi.org/10.1007/s13278-016-0327-z
Article Google Scholar
Gonz´alez, P., Casta˜no, A., Chawla, N.V., Coz, J.J.D.: A review on quantification learning. ACM Comput. Surv. 50(5), 74:1–74:40 (2017). https://doi.org/10.1145/3117807
Article Google Scholar
Huang, X., Michael, C., Smith, M.J.P., Ryzhkov, D., Quinn, S.C., Broniatowski, D.A., Dredze, M.: Examining patterns of influenza vaccination in social media. In: AAAI Joint Workshop on Health Intelligence (2017)
Google Scholar
Ji, X., Chun, S.A., Geller, J.: Monitoring public health concerns using twitter sentiment classifications. In: IEEE International Conference on Healthcare Informatics (2013). https://doi.org/10.1109/ICHI.2013.47
Lamb, A., Paul, M.J., Dredze, M.: Separating fact from fear: tracking flu infections on Twitter. In: NAACL (2013)
Google Scholar
Mitra, T., Counts, S., Pennebaker, J.: Understanding anti-vaccination attitudes in social media. In: ICWSM (2016)
Google Scholar
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 Task4: sentiment analysis in Twitter. In: Proceedings of SemEval-2016 (2016)
Google Scholar
Paul, M.J., Dredze, M.: Social monitoring for public health. In: Synthesis Lectures on Information Concepts, Retrieval, and Services, pp. 1–185. Morgan & Claypool (2017)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)
Google Scholar
P´erez-G´allego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017). https://doi.org/10.1016/j.inffus.2016.07.001
Article Google Scholar
Sebastiani, F.: Sentiment quantification of user-generated content. In: ESNAM (2018)
Google Scholar
Xue, J.C., Weiss, G.M.: Quantification and semi-supervised classification methods for handling changes in class distribution. In: KDD (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Science, University of Colorado, Boulder, 80309, CO, USA
Ashlynn R. Daughton & Michael J. Paul
Analytics, Intelligence, and Technology, Los Alamos National Laboratory, Los Alamos, 87545, NM, USA
Ashlynn R. Daughton

Authors

Ashlynn R. Daughton
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Paul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashlynn R. Daughton .

Editor information

Editors and Affiliations

Department of Pediatrics, The University of Tennessee Health Science Center – Oak-Ridge National Lab (UTHSC-ORNL) Center for Biomedical Informatics, Memphis, TN, USA
Arash Shaban-Nejad
School of Nursing, University of Minnesota, Minneapolis, MN, USA
Martin Michalowski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Daughton, A.R., Paul, M.J. (2020). Constructing Accurate Confidence Intervals When Aggregating Social Media Data for Public Health Monitoring. In: Shaban-Nejad, A., Michalowski, M. (eds) Precision Health and Medicine. W3PHAI 2019. Studies in Computational Intelligence, vol 843. Springer, Cham. https://doi.org/10.1007/978-3-030-24409-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-24409-5_2
Published: 02 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24408-8
Online ISBN: 978-3-030-24409-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics