Skip to main content

PADI-web: An Event-Based Surveillance System for Detecting, Classifying and Processing Online News

  • Conference paper
  • First Online:
Book cover Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2017)

Abstract

The Platform for Automated Extraction of Animal Disease Information from the Web (PADI-web) is a multilingual text mining tool for automatic detection, classification, and extraction of disease outbreak information from online news articles. PADI-web currently monitors the Web for nine animal infectious diseases and eight syndromes in five animal hosts. The classification module is based on a supervised machine learning approach to filter the relevant news with an overall accuracy of 0.94. The classification of relevant news between 5 topic categories (confirmed, suspected or unknown outbreak, preparedness and impact) obtained an overall accuracy of 0.75. In the first six months of its implementation (January–June 2016), PADI-web detected 73% of the outbreaks of African swine fever; 20% of foot-and-mouth disease; 13% of bluetongue, and 62% of highly pathogenic avian influenza. The information extraction module of PADI-web obtained F-scores of 0.80 for locations, 0.85 for dates, 0.95 for diseases, 0.95 for hosts, and 0.85 for case numbers.

PADI-web allows complementary disease surveillance in the domain of animal health.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.plateforme-esa.fr/.

  2. 2.

    https://padi-web.cirad.fr/en/.

  3. 3.

    http://aims.fao.org/vest-registry/vocabularies/agrovoc.

  4. 4.

    https://azure.microsoft.com/en-gb/services/cognitive-services/translator-text-api/.

  5. 5.

    This definition is only based in the news semantic, and do not take into account the official confirmation by a formal source.

  6. 6.

    https://www.oie.int/en/animal-health-in-the-world/technical-disease-cards/.

References

  1. Ahlers, D.: Assessment of the accuracy of GeoNames gazetteer data. In: Proceedings of the 7th Workshop on Geographic Information Retrieval, pp. 74–81. ACM, New York (2013)

    Google Scholar 

  2. Arsevska, E.: Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web. Comput. Electron. Agric. 123, 104–115 (2016). https://doi.org/10.1016/j.compag.2016.02.010

    Article  Google Scholar 

  3. Arsevska, E., et al.: Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System. PLoS ONE 13(8), e0199960 (2018). https://doi.org/10.1371/journal.pone.0199960

    Article  Google Scholar 

  4. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–D270 (2004). https://doi.org/10.1093/nar/gkh061

  5. Breiman, L.: Random Forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  6. Brownstein, J.S., Freifeld, C.C., Reis, B.Y., Mandl, K.D.: Surveillance Sans Frontiéres: Internet-based emerging infectious disease intelligence and the healthmap project. PLOS Med. 5(7), 1–6 (2008). https://doi.org/10.1371/journal.pmed.0050151

    Article  Google Scholar 

  7. Collier, N., Doan, S.: GENI-DB: a database of global events for epidemic intelligence. Bioinformatics 28(8), 1186–1188 (2012). https://doi.org/10.1093/bioinformatics/bts099

    Article  Google Scholar 

  8. Collier, N., et al.: BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics 24(24), 2940–2941 (2008). https://doi.org/10.1093/bioinformatics/btn534

    Article  Google Scholar 

  9. Joachims, T.: Text categorization with Support Vector Machines: learning with many relevant features. In: Nédellec, NédellC, Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683

    Chapter  Google Scholar 

  10. Lejeune, G., Brixtel, R., Doucet, A., Lucas, N.: Multilingual event extraction for epidemic detection. Artif. Intell. Med. 65(2), 131–143 (2015)

    Article  Google Scholar 

  11. Madoff, L.C.: ProMED-Mail: an early warning system for emerging diseases. Clin. Infect. Dis. 39(2), 227–232 (2004). https://doi.org/10.1086/422003

    Article  Google Scholar 

  12. Murtagh, F.: Multilayer perceptrons for classification and regression. Neurocomputing 2(5), 183–197 (1991). https://doi.org/10.1016/0925-2312(91)90023-5

    Article  MathSciNet  Google Scholar 

  13. Nahm, U.Y., Mooney, R.J.: Using information extraction to aid the discovery of prediction rules from text. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, KDD-2000 Workshop on Text Mining, pp. 51–58 (2000)

    Google Scholar 

  14. Paquet, C., Coulombier, D., Kaiser, R., Ciotti, M.: Epidemic intelligence: a new framework for strengthening disease surveillance in Europe. Euro. Surveill. 11(12), 212–214 (2006). 665 [pii]

    Article  Google Scholar 

  15. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarbe, R.: Text Mining from the Web for Medical Intelligence. NATO Science for Peace and Security Series, D: Information and Communication Security, pp. 295–310 (2008)

    Google Scholar 

  17. Richardson, L.: Beautiful soup documentation (April 2007)

    Google Scholar 

  18. Robertson, C., Yee, L.: Avian influenza risk surveillance in North America with online media. PLoS ONE 11(11), 1–21 (2016). https://doi.org/10.1371/journal.pone.0165688

    Article  Google Scholar 

  19. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0

    Article  Google Scholar 

  20. Strotgen, J., Gertz, M.: HeidelTime: high quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324 (July 2010)

    Google Scholar 

  21. Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings of Workshop on Frequent Itemset Mining Implementations, FIMI 2003 (2003)

    Google Scholar 

  22. Valentin, S., et al.: PADI-web: a multilingual event-based surveillance system for monitoring animal infectious diseases. Comput. Electron. Agric. 169, 105163 (2020). https://doi.org/10.1016/j.compag.2019.105163

    Article  Google Scholar 

Download references

Acknowledgements

We thank J. de Goër, B. Belot, C. Hemeury, M. Devaud, and T. Filiol for their contribution in the development of PADI-web. We also thank the members of the French Epidemic Intelligence Team in Animal Health for their constructive comments during the development of PADI-web. This work has been supported by the French General Directorate for Food (DGAL), the French Agricultural Research Centre for International Development (CIRAD), the SONGES Project (FEDER and Occitanie), and the French National Research Agency under the Investments for the Future Program, referred as ANR-16-CONV-0004 (#DigitAg). This work has also been funded by the “Monitoring outbreak events for disease surveillance in a data science context” (MOOD) project from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 874850 (https://mood-h2020.eu/).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarah Valentin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Valentin, S. et al. (2020). PADI-web: An Event-Based Surveillance System for Detecting, Classifying and Processing Online News. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2017. Lecture Notes in Computer Science(), vol 12598. Springer, Cham. https://doi.org/10.1007/978-3-030-66527-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66527-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66526-5

  • Online ISBN: 978-3-030-66527-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics