Skip to main content

DiNer - On Building Multilingual Disease-News Profiler

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 12130))

Abstract

Disease-News Profiler aims to gather a collection of online news articles containing information related to diseases. A need for such profiler arises in epidemic intelligence where it acts as an information system for diseases. It can be used by health agencies and researchers to track any epidemic or to develop a knowledge base for diseases. Much of the existing profiling techniques have targeted specific languages like English, Arabic, Chinese, Spanish or Russian but have largely ignored many Asian and resource-poor languages. Building a multilingual disease-news profiler has a huge advantage in terms of coverage, timeliness, quality and information enrichment. In this paper we propose a novel system, DiNer for filtering and indexing of Disease-News. We have developed a language agnostic and low-resource based filtering technique which uses a Support Vector Machine based classifier to identify instances of Disease-news from any given news corpus. In this paper, we describe our novel approach of feature engineering and the development of Disease-Related corpus for training our SVM classifier. We have tested our filtering module on four languages - English, Hindi, Punjabi and Gujarati. Our filtering technique performs significantly better than the baseline-approach both in terms of F-Score(>5%) and recall(>50%) across languages.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.promedmail.org/.

  2. 2.

    http://www.healthmap.org/en/.

  3. 3.

    http://idsp.nic.in/.

  4. 4.

    http://www.cdc.gov/diseasesconditions/.

  5. 5.

    https://translate.google.co.in/.

  6. 6.

    https://www.google.com/inputtools/try/.

  7. 7.

    http://www.shabdkosh.com/.

References

  1. Al-Tawfiq, J.A., Zumla, A., Gautret, P., Gray, G.C., Hui, D.S., Al-Rabeeah, A.A., Memish, Z.A.: Surveillance for emerging respiratory viruses. Lancet. Infect. Dis 14(10), 992–1000 (2014)

    Article  Google Scholar 

  2. Barua, J., Patel, D., Goyal, V.: Tide: template-independent discourse data extraction. In: Big Data Analytics and Knowledge Discovery - 17th International Conference, DaWaK 2015, Valencia, Spain, 1–4 September, 2015, Proceedings, pp. 149–162 (2015)

    Google Scholar 

  3. Brixtel, R., Lejeune, G., Doucet, A., Lucas, N.: Any language early detection of epidemic diseases from web news streams. In: 2013 IEEE International Conference on Healthcare Informatics (ICHI), pp. 159–168. IEEE (2013)

    Google Scholar 

  4. BBrownstein, J.S., Freifeld, C.C., Madoff, L.C.: Digital disease detection—harnessing the web for public health surveillance. New England J. Med. 360(21), 2153–2157 (2009)

    Google Scholar 

  5. Collier, N., et al.: Biocaster: detecting public health rumors with a web-based text mining system. Bioinformatics 24(24), 2940–2941 (2008)

    Article  Google Scholar 

  6. Collier, N., et al.: A multilingual ontology for infectious disease surveillance: rationale, design and challenges. Lang. Resour. Eval. 40(3–4), 405–413 (2006)

    Google Scholar 

  7. Dion, M., AbdelMalik, P., Mawudeku, A.: Big data and the global public health intelligence network (GPHIN). Can. Commun. Dis. Rep. 41(9), 209 (2015)

    Article  Google Scholar 

  8. Freifeld, C.C., Mandl, K.D., Reis, B.Y., Brownstein, J.S.: Healthmap: global infectious disease monitoring through automated classification and visualization of internet media reports. J. Am. Med. Inform. Assoc. 15(2), 150–157 (2008)

    Article  Google Scholar 

  9. Garg, A., Syal, V., Gudlani, P., Patel, D.: Mining credible and relevant news from social networks. In: Big Data Analytics - 5th International Conference, BDA 2017, Hyderabad, India, 12–15 December, 2017, Proceedings, pp. 90–102 (2017)

    Google Scholar 

  10. Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (2009)

    Article  Google Scholar 

  11. Gupta, K., Mittal, V., Bishnoi, B., Maheshwari, S., Patel, D.: Act: accuracy-aware crawling techniques for cloud-crawler. World Wide Web 19(1), 69–88 (2016)

    Article  Google Scholar 

  12. Gupta, S., Patel, D.: Ne 2: named event extraction engine. Knowl. Inf. Syst. 59(2), 311–335 (2019)

    Article  Google Scholar 

  13. Herman Tolentino, M., Raoul Kamadjeu, M., Michael Matters PhD, M., Marjorie Pollack, M., Larry Madoff, M.: Scanning the emerging infectious diseases horizon-visualizing promed emails using epispider. Adv. Dis. Surveill. 2, 169 (2007)

    Google Scholar 

  14. Hersh, W.: Information Retrieval: A Health and Biomedical Perspective: A Health and Biomedical Perspective. Springer Science & Business Media (2008)

    Google Scholar 

  15. Joshi, A., Karimi, S., Sparks, R., Paris, C., MacIntyre, C.R.: Survey of text-based epidemic intelligence: A computational linguistic perspective. CoRR abs/1903.05801 (2019)

    Google Scholar 

  16. Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 441–450. ACM (2010)

    Google Scholar 

  17. Kumar, A., Patel, D., Jain, N.: Lightweight system for ne-tagged news headlines corpus creation. In: 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, 5–8 December, 2016. pp. 3903–3912 (2016)

    Google Scholar 

  18. Lejeune, G., Brixtel, R., Doucet, A., Lucas, N.: Multilingual event extraction for epidemic detection. Artif. Intell. Med. 65(2), 131–143 (2015)

    Article  Google Scholar 

  19. Mazumder, S., Bishnoi, B., Patel, D.: News headlines: What they can tell us? In: Proceedings of the 6th IBM Collaborative Academia Research Exchange Conference (I-CARE) on I-CARE 2014, pp. 1–4. ACM (2014)

    Google Scholar 

  20. Mollema, L., et al.: Disease detection or public opinion reflection? content analysis of tweets, other social media, and online newspapers during the measles outbreak in the netherlands in 2013. J. Med. Internet Res. 17(5), e128 (2015)

    Article  Google Scholar 

  21. Sharma, S., Agrawal, A., Patel, D.: Class aware exemplar discovery from microarray gene expression data. In: Big Data Analytics - 4th International Conference, BDA 2015, Hyderabad, India, 15–18 December, 2015, Proceedings, pp. 244–257 (2015)

    Google Scholar 

  22. Singh, S.P., Khosla, S., Rustagi, S., Patel, M., Patel, D.: SL - FII: syntactic and lexical constraints with frequency based iterative improvement for disease mention recognition in news headlines, pp. 28–34 (2016)

    Google Scholar 

  23. Steinberger, R.: A survey of methods to ease the development of highly multilingual text mining applications. Lang. Resour. Eval. 46(2), 155–176 (2012)

    Article  Google Scholar 

  24. Steinberger, R., Pouliquen, B., der Goot, E.V.: An introduction to the Europe media monitor family of applications. CoRR abs/1309.5290 (2013)

    Google Scholar 

  25. Velasco, E., Agheneza, T., Denecke, K., Kirchner, G., Eckmanns, T.: Social media and internet-based data in global systems for public health surveillance: A systematic review. Milbank Q. 92(1), 7–33 (2014)

    Article  Google Scholar 

  26. Yan, S., Chughtai, A., Macintyre, C.: Utility and potential of rapid epidemic intelligence from internet-based sources. Int. J. Infect. Dis. 63, 77–87 (2017)

    Article  Google Scholar 

  27. Yangarber, R., Von Etter, P., Steinberger, R.: Content collection and analysis in the domain of epidemiology. In: Proceedings of DrMED-2008: International Workshop on Describing Medical Web Resources (2008)

    Google Scholar 

  28. Yom-Tov, E., Borsa, D., Cox, I.J., McKendry, R.A.: Detecting disease outbreaks in mass gatherings using internet data. J. Med. Internet Res. 16(6), e154 (2014)

    Article  Google Scholar 

  29. Yu, V.L., Madoff, L.C.: Promed-mail: An early warning system for emerging diseases. Clin. Infect. Dis. 39(2), 227–232 (2004)

    Article  Google Scholar 

Download references

Acknowledgment

We would also like to show our gratitude to the institution IIT Roorkee for providing resources to conduct this research and thank our colleagues from IIT Roorkee who provided insight and expertise that greatly assisted the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sajal Rustagi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rustagi, S., Patel, D. (2020). DiNer - On Building Multilingual Disease-News Profiler. In: Hameurlain, A., Tjoa, A. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIII. Lecture Notes in Computer Science(), vol 12130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62199-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-62199-8_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-62198-1

  • Online ISBN: 978-3-662-62199-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics