Skip to main content

A Data-Driven Score Model to Assess Online News Articles in Event-Based Surveillance System

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1577))

Abstract

Online news sources are popular resources for learning about current health situations and developing event-based surveillance (EBS) systems. However, having access to diverse information originating from multiple sources can misinform stakeholders, eventually leading to false health risks. The existing literature contains several techniques for performing data quality evaluation to minimize the effects of misleading information. However, these methods only rely on the extraction of spatiotemporal information for representing health events. To address this research gap, a score-based technique is proposed to quantify the data quality of online news articles through three assessment measures: 1) news article metadata, 2) content analysis, and 3) epidemiological entity extraction with NLP to weight the contextual information. The results are calculated using classification metrics with two evaluation approaches: 1) a strict approach and 2) a flexible approach. The obtained results show significant enhancement in the data quality by filtering irrelevant news, which can potentially reduce false alert generation in EBS systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alomar, O., et al.: Development and testing of the media monitoring tool med is YS for the monitoring, early identification and reporting of existing and emerging plant health threats. EFSA Supporting Publications 13(12), 1118E (2016)

    Article  Google Scholar 

  2. Arsevska, E., Roche, M., Falala, S., Lancelot, R., Chavernac, D., Hendrikx, P., Dufour, B.: Monitoring disease outbreak events on the web using text-mining approach and domain expert knowledge. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). pp. 3407–3411 (2016)

    Google Scholar 

  3. Arsevska, E., et al.: Web monitoring of emerging animal infectious diseases integrated in the French animal health epidemic intelligence system. PLoS One 13(8), e0199960 (2018)

    Google Scholar 

  4. Bachmann, P., Eisenegger, M., Ingenhoff, D.: Defining and measuring news media quality: Comparing the content perspective and the audience perspective. The International Journal of Press/Politics, p. 1940161221999666 (2021)

    Google Scholar 

  5. Balajee, S.A., Salyer, S.J., Greene-Cramer, B., Sadek, M., Mounts, A.W.: The practice of event-based surveillance: concept and methods. Global Secur. Health Sci. Policy 6(1), 1–9 (2021)

    Article  Google Scholar 

  6. Bastick, Z.: Would you notice if fake news changed your behavior? an experiment on the unconscious effects of disinformation. Comput. Hum. Behav. 116, 106633 (2021)

    Article  Google Scholar 

  7. Batini, C., Scannapieco, M., et al.: Data and information quality. Cham, Switzerland: Springer International Publishing. Google Scholar 43 (2016)

    Google Scholar 

  8. Bhuiyan, M.M., Zhang, A.X., Sehat, C.M., Mitra, T.: Investigating differences in crowdsourced news credibility assessment: Raters, tasks, and expert criteria. Proceedings of the ACM on Human-Computer Interaction 4(CSCW2), 1–26 (2020)

    Article  Google Scholar 

  9. Carneiro, H.A., Mylonakis, E.: Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin. Infect. Dis. 49(10), 1557–1564 (2009)

    Article  Google Scholar 

  10. Cato, K.D., Cohen, B., Larson, E.: Data elements and validation methods used for electronic surveillance of health care-associated infections: a systematic review. Am. J. Infect. Control 43(6), 600–605 (2015)

    Article  Google Scholar 

  11. Chan, L.M., Childress, E., Dean, R., O’neill, E.T., Vizine-Goetz, D.: A faceted approach to subject data in the Dublin core metadata record. J. Internet Cataloging 4(1–2), 35–47 (2001)

    Google Scholar 

  12. Chang, A.X., Manning, C.D.: Sutime: a library for recognizing and normalizing time expressions. In: Lrec, vol. 3735, p. 3740 (2012)

    Google Scholar 

  13. Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Brief. Bioinform. 6(1), 57–71 (2005)

    Article  Google Scholar 

  14. Edelstein, M., Lee, L.M., Herten-Crabb, A., Heymann, D.L., Harper, D.R.: Strengthening global public health surveillance through data and benefit sharing. Emerg. Infect. Dis. 24(7), 1324 (2018)

    Article  Google Scholar 

  15. Elhadad, M.K., Li, K.F., Gebali, F.: A novel approach for selecting hybrid features from online news textual metadata for fake news detection. In: Barolli, L., Hellinckx, P., Natwichai, J. (eds.) 3PGCIC 2019. LNNS, vol. 96, pp. 914–925. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33509-0_86

    Chapter  Google Scholar 

  16. Essam, M., Elsayed, T.: Why is that a background article: a qualitative analysis of relevance for news background linking. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2009–2012 (2020)

    Google Scholar 

  17. Ganser, I.: Evaluation of event-based internet biosurveillance for multi-regional detection of seasonal influenza onset. Ph.D. thesis, McGill University (Canada) (2020)

    Google Scholar 

  18. Hu, Y., Li, M., Li, Z., Ma, W.: Discovering authoritative news sources and top news stories. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 230–243. Springer, Heidelberg (2006). https://doi.org/10.1007/11880592_18

    Chapter  Google Scholar 

  19. Islam, M.R., Liu, S., Wang, X., Xu, G.: Deep learning for misinformation detection on online social networks: a survey and new perspectives. Soc. Netw. Anal. Min. 10(1), 1–20 (2020). https://doi.org/10.1007/s13278-020-00696-x

    Article  Google Scholar 

  20. Jafarpour, N., Izadi, M., Precup, D., Buckeridge, D.L.: Quantifying the determinants of outbreak detection performance through simulation and machine learning. J. Biomed. Inform. 53, 180–187 (2015)

    Article  Google Scholar 

  21. Kim, M., Chae, K., Lee, S., Jang, H.J., Kim, S.: Automated classification of online sources for infectious disease occurrences using machine-learning-based natural language processing approaches. Int. J. Environ. Res. Public Health 17(24), 9467 (2020)

    Article  Google Scholar 

  22. Leidner, J.L., Lieberman, M.D.: Detecting geographical references in the form of place names and associated spatial natural language. Sigspatial Special 3(2), 5–11 (2011)

    Article  Google Scholar 

  23. Lever, J., Krzywinski, M., Altman, N.: Classification evaluation (vol 13, pg 603, 2016). Nat. Methods 13(10), 890–890 (2016)

    Article  Google Scholar 

  24. Lin, M.Y., Hota, B., Khan, Y.M., Woeltje, K.F., Borlawsky, T.B., Doherty, J.A., Stevenson, K.B., Weinstein, R.A., Trick, W.E., Program, C.P.E., et al.: Quality of traditional surveillance for public reporting of nosocomial bloodstream infection rates. JAMA 304(18), 2035–2041 (2010)

    Article  Google Scholar 

  25. Lohmann, S., Heimerl, F., Bopp, F., Burch, M., Ertl, T.: Concentri cloud: word cloud visualization for multiple text documents. In: 2015 19th International Conference on Information Visualisation, pp. 114–120. IEEE (2015)

    Google Scholar 

  26. Mandalios, J.: Radar: an approach for helping students evaluate internet sources. J. Inf. Sci. 39(4), 470–478 (2013)

    Article  Google Scholar 

  27. Nozato, Y.: Credibility of online newspapers. Convención Anual de la Association for Education in Journalism and Mass Communication. Washington, DC Disponible en (2002): http://citeseerx.ist.psu.edu/viewdoc/summary

  28. Organization, W.H., et al.: A guide to establishing event-based surveillance. World Health Organization (2008)

    Google Scholar 

  29. Organization, W.H., et al.: Early detection, assessment and response to acute public health events: implementation of early warning and response with a focus on event-based surveillance: interim version. World Health Organization, Technical report (2014)

    Google Scholar 

  30. Pustejovsky, J., Castano, J.M., Ingria, R., Sauri, R., Gaizauskas, R.J., Setzer, A., Katz, G., Radev, D.R.: Timeml: robust specification of event and temporal expressions in text. New Directions Question Answering 3, 28–34 (2003)

    Google Scholar 

  31. Rees, E., Ng, V., Gachon, P., Mawudeku, A., McKenney, D., Pedlar, J., Yemshanov, D., Parmely, J., Knox, J.: Early detection and prediction of infectious disease outbreaks. CCDR 45, 5 (2019)

    Article  Google Scholar 

  32. Richardson, L.: Beautiful soup documentation. Dosegljivo (2007). https://www.crummy.com/software/BeautifulSoup/bs4/doc/. [Dostopano: 7. 7. 2018]

  33. Rudnik, C., Ehrhart, T., Ferret, O., Teyssou, D., Troncy, R., Tannier, X.: Searching news articles using an event knowledge graph leveraged by wikidata. In: Companion Proceedings of The 2019 World Wide Web Conference, WWW 2019, pp. 1232–1239. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3308560.3316761, https://doi.org/10.1145/3308560.3316761

  34. Valentin, S.: Extraction and combination of epidemiological information from informal sources for animal infectious diseases surveillance. Ph.D. thesis, Université Montpellier (2020)

    Google Scholar 

  35. Vasiliev, Y.: Natural Language Processing with Python and SpaCy: A Practical Introduction. No Starch Press (2020)

    Google Scholar 

  36. Vaziri, R., Mohsenzadeh, M.: A questionnaire-based data quality methodology. Int. J. Database Manage. Syst. 4(2), 55 (2012)

    Article  Google Scholar 

  37. Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)

    Article  Google Scholar 

  38. Westerman, D., Spence, P.R., Van Der Heide, B.: Social media as information source: recency of updates and credibility of information. J. Comput.-Mediat. Commun. 19(2), 171–183 (2014)

    Article  Google Scholar 

  39. Ye, J., Skiena, S.: Mediarank: computational ranking of online news sources. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2469–2477 (2019)

    Google Scholar 

  40. Zhou, C., Xiu, H., Wang, Y., Yu, X.: Characterizing the dissemination of misinformation on social media in health emergencies: an empirical study based on covid-19. Inf. Process. Manage. 58(4), 102554 (2021)

    Article  Google Scholar 

  41. Zhu, X., Gauch, S.: Incorporating quality metrics in centralized/distributed information retrieval on the world wide web. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 288–295 (2000)

    Google Scholar 

Download references

Acknowledgments

This study was partially funded by EU grant 874850 MOOD and is catalogued as MOOD023. The contents of this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syed Mehtab Alam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alam, S.M., Arsevska, E., Roche, M., Teisseire, M. (2022). A Data-Driven Score Model to Assess Online News Articles in Event-Based Surveillance System. In: Lossio-Ventura, J.A., et al. Information Management and Big Data. SIMBig 2021. Communications in Computer and Information Science, vol 1577. Springer, Cham. https://doi.org/10.1007/978-3-031-04447-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04447-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04446-5

  • Online ISBN: 978-3-031-04447-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics