Abstract
Social media data are widely used to infer health related information (e.g., the number of individuals with symptoms). A typical approach is to use a machine learning classification to aggregate and count the information of interest. However, this approach fails to account for errors made by the classifier. This paper summarizes data mining concepts that account for classifier error when counting data instances, and then extends these ideas to propose a new algorithm for constructing confidence intervals of social media estimates that we show to be substantially more accurate than standard approaches on two influenza-related Twitter datasets.
LA-UR-18-24425.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bella, A., Ferri, C., Hernandez-Orallo, J., Ramirez-Quintana, M.J.: Quantification via probability estimators. In: ICDM (2010). https://doi.org/10.1109/ICDM.2010.75
Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In Proceedings of the 1st Workshop on Social Media Analytics, Washington D.C, pp. 115–122 (2010)
Da San Martino, G., Gao, W., Sebastiani, F.: Ordinal text quantification. In: SIGIR (2016). https://doi.org/10.1145/2911451.2914749
Doan, S., Ohno-Machado, L., Collier, N.: Enhancing Twitter data analysis with simple semantic filtering: example in tracking influenza-like illnesses (2012)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, Boca Raton (1993)
Forman, G.: Counting positives accurately despite inaccurate classification. In: ECML (2005)
Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Discov. 17(2), 164–206 (2008). https://doi.org/10.1007/s10618-008-0097-y
Gao, W., Sebastiani, F.: Tweet sentiment: from classification to quantification. In: ASONAM (2015). https://doi.org/10.1145/2808797.2809327
Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. SNAM 6(1), 19 (2016). https://doi.org/10.1007/s13278-016-0327-z
Gonz´alez, P., Casta˜no, A., Chawla, N.V., Coz, J.J.D.: A review on quantification learning. ACM Comput. Surv. 50(5), 74:1–74:40 (2017). https://doi.org/10.1145/3117807
Huang, X., Michael, C., Smith, M.J.P., Ryzhkov, D., Quinn, S.C., Broniatowski, D.A., Dredze, M.: Examining patterns of influenza vaccination in social media. In: AAAI Joint Workshop on Health Intelligence (2017)
Ji, X., Chun, S.A., Geller, J.: Monitoring public health concerns using twitter sentiment classifications. In: IEEE International Conference on Healthcare Informatics (2013). https://doi.org/10.1109/ICHI.2013.47
Lamb, A., Paul, M.J., Dredze, M.: Separating fact from fear: tracking flu infections on Twitter. In: NAACL (2013)
Mitra, T., Counts, S., Pennebaker, J.: Understanding anti-vaccination attitudes in social media. In: ICWSM (2016)
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 Task4: sentiment analysis in Twitter. In: Proceedings of SemEval-2016 (2016)
Paul, M.J., Dredze, M.: Social monitoring for public health. In: Synthesis Lectures on Information Concepts, Retrieval, and Services, pp. 1–185. Morgan & Claypool (2017)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)
P´erez-G´allego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017). https://doi.org/10.1016/j.inffus.2016.07.001
Sebastiani, F.: Sentiment quantification of user-generated content. In: ESNAM (2018)
Xue, J.C., Weiss, G.M.: Quantification and semi-supervised classification methods for handling changes in class distribution. In: KDD (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Daughton, A.R., Paul, M.J. (2020). Constructing Accurate Confidence Intervals When Aggregating Social Media Data for Public Health Monitoring. In: Shaban-Nejad, A., Michalowski, M. (eds) Precision Health and Medicine. W3PHAI 2019. Studies in Computational Intelligence, vol 843. Springer, Cham. https://doi.org/10.1007/978-3-030-24409-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-24409-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24408-8
Online ISBN: 978-3-030-24409-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)