Skip to main content

An Ontology for Generalized Disease Incidence Detection on Twitter

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10334))

Included in the following conference series:

Abstract

In this paper, we present an ontology of disease related concepts that is designated for detection of disease incidence in tweets. Unlike previous key word based systems and topic modeling approaches, our ontological approach allows us to apply more stringent criteria for determining which messages are relevant such as spatial and temporal characteristics whilst giving a stronger guarantee that the resulting models will perform well on new data that may be lexically divergent. We achieve this by training supervised learners on concepts rather than individual words. Effectively, we map every possible word to a fixed length lexicon thereby eliminating lexical divergence between training data and new data. For training we use a dataset containing mentions of influenza, common cold and Listeria and use the learned models to classify datasets containing mentions of an arbitrary selection of other diseases. We show that our ontological approach results in models whose performance is not only good but also stable on lexically divergent data versus a word-level lookup unigram, bag of words baseline approach. We also show that word vectors can be learned directly from our concepts to achieve even better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/MarkMagumba/Twitter-Disease-incidence-Description-Language-Ontology.

  2. 2.

    General architecture for text engineering.

References

  1. Lee, K., Agrawal, A., Choudary, A.: Real time disease surveillance using twitter data: case study flu and cancer. In: ACM, Chicago, Illinois, USA, pp. 1474–1477 (2013)

    Google Scholar 

  2. Google Inc, https://www.google.org/flutrends/about/

  3. Paul, M.J., Dredze, M.: Discovering health topics in social media using topic models. PLoS ONE 9, 8 (2014)

    Google Scholar 

  4. Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web, pp. 411–416. IEEE, Naregno, Elba island, Italy (2010)

    Google Scholar 

  5. Collier, N., Doan, S., Kawazoe, A., Goodwin, R.M., Conway, M., Tateno, Y., et al.: Biocaster: detecting public health rumors with a web-based text mining system. Bioinform. 24(24), 2940–2941 (2008)

    Article  Google Scholar 

  6. Okhmatovskaia, A., Chapman, W., Collier, N., Espino, J., Buckeridge, D.L.: SSO: The Syndromic Surveillance Ontology https://www.bioontology.org/sites/default/files/SSO.pdf

  7. Porta, M.: A Dictionary of Epidemiology. Oxford University Press, New York (2008)

    Google Scholar 

  8. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., et al.: The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotech. 25, 1251–1255 (2007)

    Article  Google Scholar 

  9. Osborne, J.D., Flatow, J., Holko, M., Lin, S.M., Kibbe, W.A., Zhue, L., et al.: Annotating the human genome with disease ontology. BMC Genom. 10, 1 (2009)

    Article  Google Scholar 

  10. Pesquira, C., Ferreira, J.D., Couto, M.F., Silva, M.J.: The epidemiology ontology: an ontology for semantic annotation of epidemiological resources. J. Biomed. Semant. 5, 4 (2014)

    Article  Google Scholar 

  11. Clark, T., Ciccarese, P.N., Goble, C.A.: Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J. Biomed. Semant. 5(1), 1–33 (2014)

    Article  Google Scholar 

  12. Elliott, J., Mavergames, C., Becker, L., Meerpohl, J., Thomas, J., Gruen, R., Tovey, D.: Achieving high quality and efficient systematic review through technological innovation. BMJ Rapid Response (2013) http://www.bmj.com/content/346/bmj.f139/rr/625503

  13. Smith, B., Fellbaum, C.: Medical Wordnet: A New Methodology for the Construction and Validation of Information Resources for Consumer Health, p. 371. ACM, Geneva (2004)

    Google Scholar 

  14. Taylor, A., Marcus, M., Santorini, B.: The Penn Treebank: An Overview. In: Abeille, A. (ed.) Treebanks. Building and Using Parsed Corpora, pp. 5–22. Springer, Netherlands (2003)

    Chapter  Google Scholar 

  15. Derczynski, L., Ritter, A., Clark, S., Bontcheva, K.: Twitter part-of-speech tagging for all: overcoming sparse and noisy data. In: ACL, Hisar, Bulgaria, pp. 198–206 (2013)

    Google Scholar 

  16. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: Gate: an architecture for development of robust HLT applications. In: ACL, Philadelphia, USA, pp. 168–175 (2002)

    Google Scholar 

  17. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: ACL, Hong Kong, pp. 63–70 (2000)

    Google Scholar 

  18. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: ACM, Edmonton, Canada, pp. 252–259 (2003)

    Google Scholar 

  19. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation Of Word Representations In Vector Space. Google Curran Associates Inc., Arizona, USA (2013)

    Google Scholar 

  20. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: JMLR Workshop and Conference Proceedings, Beijing, China, pp. 1188–1196 (2014)

    Google Scholar 

  21. Rehurek, R., Sojka, P.: Software Framework for Topic Modeling with Large Corpora, pp. 46–50. University of Malta Valetta, Malta (2010)

    Google Scholar 

  22. Pedregrosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. 12, 2825–2830 (2011)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Abraham Magumba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Magumba, M.A., Nabende, P. (2017). An Ontology for Generalized Disease Incidence Detection on Twitter. In: Martínez de Pisón, F., Urraca, R., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2017. Lecture Notes in Computer Science(), vol 10334. Springer, Cham. https://doi.org/10.1007/978-3-319-59650-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59650-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59649-5

  • Online ISBN: 978-3-319-59650-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics