Skip to main content

Named Entity Recognition on CORD-19 Bio-Medical Dataset with Tolerance Rough Sets

  • Chapter
  • First Online:
Transactions on Rough Sets XXIII

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 13610))

  • 149 Accesses

Abstract

Biomedical named entity recognition is becoming increasingly important to biomedical research due to a proliferation of articles and also due to the current pandemic disease. This paper addresses the task of automatically finding and recognizing biomedical entity types related to COVID (e.g., virus, cell, therapeutic) with tolerance rough sets. The task includes i) extracting nouns and their co-occurring contextual patterns from a large BioNER dataset related to COVID-19 and, ii) annotating unlabelled data with a semi-supervised learning algorithm using co-occurence statistics. 465,250 noun phrases and 6,222,196 contextual patterns were extracted from 29,500 articles using natural language text processing methods. Three categories were successfully classified at this time: virus, cell and therapeutic. Early precision@N results demonstrate that our proposed tolerant pattern learner (TPL) is able to constrain concept drift in all 3 categories during the iterative learning process.

S. Ramanna—This work is dedicated to Prof. Z. Pawlak on his \(95^{th}\) birthday.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nlm.nih.gov/bsd/pmresources.html.

  2. 2.

    http://lemurproject.org/clueweb09/.

References

  1. Callan, J.: The lemur project and its clueweb12 dataset. In: Invited Talk at the SIGIR 2012 Workshop on Open-Source Information Retrieval (2012)

    Google Scholar 

  2. Cho, H., Lee, H.: Biomedical named entity recognition using deep neural networks with contextual information. BMC Bioinform. 20, 1–11 (2019). Article number: 735. https://doi.org/10.1186/s12859-019-3321-4

  3. Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17, 199–212 (2002)

    Article  MATH  Google Scholar 

  4. Kawasaki, S., Binh, N., Bao, T.: Hierarchical document clustering based on tolerance rough set model. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45372-5_51

    Chapter  Google Scholar 

  5. Marcus, S.: Tolerance rough sets, Cech topologies, learning processes. Bull. Pol. Acad. Sci. Tech. Sci. 42(3), 471–487 (1994)

    MATH  Google Scholar 

  6. Moghaddam, H.: Exploring scalability and concept drift issues in learning categorical facts with tolerance rough sets. Master’s thesis, University of Winnipeg (2019). Supervisor: S. Ramanna

    Google Scholar 

  7. Moghaddam, H., Ramanna, S.: Harvesting patterns from textual web sources with tolerance rough sets. Patterns 1(4), 100053 (2020)

    Article  Google Scholar 

  8. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30(1), 3–26 (2007). http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00002

  9. Ngo, C.L.: A tolerance rough set approach to clustering web search results. Master’s thesis, Warsaw University (2003)

    Google Scholar 

  10. Nguyen, H.S.: Applications of tolerance rough set model semantic text analysis. In: Ropiak, K., Polkowski, L., Artiemjew, P. (eds.) Proceedings of the 28th International Workshop on Concurrency, Specification and Programming. CEUR Workshop Proceedings, Olsztyn, Poland, 24–26 September 2019, vol. 2571. CEUR-WS.org (2019). http://ceur-ws.org/Vol-2571/CSP2019_paper_18.pdf

  11. Nguyen, H.S., Ho, T.B.: Rough document clustering and the internet. In: Handbook of Granular Computing, pp. 987–1003 (2008)

    Google Scholar 

  12. Nguyen, S.H., Nguyen, H.S.: An approach to semantic indexing based on tolerance rough set model. In: Nguyen, N.T., van Do, T., le Thi, H.A. (eds.) Advanced Computational Methods for Knowledge Engineering. SCI, vol. 479, pp. 343–354. Springer International Publishing, Heidelberg (2013). https://doi.org/10.1007/978-3-319-00293-4_26

    Chapter  Google Scholar 

  13. Nieminen, J.: Rough tolerance equality and tolerance black boxes. Fund. Inform. 11, 289–296 (1988)

    MathSciNet  MATH  Google Scholar 

  14. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982). https://doi.org/10.1007/BF01001956

    Article  MATH  Google Scholar 

  15. Perera, N., Dehmer, M., Emmert-Streib, F.: Named entity recognition and relation detection for biomedical information extraction. Front. Cell Dev. Biol. 8, 673 (2020). https://www.frontiersin.org/article/10.3389/fcell.2020.00673

  16. Polkowski, L., Skowron, A., Zytkow, J.: Tolerance based rough sets. In: Lin, T.Y., Wildberger, M. (eds.) Soft Computing: Rough Sets, Fuzzy Logic, Neural Networks, Uncertainty Management, Knowledge Discovery, pp. 55–58. Simulation Councils Inc., San Diego (1994)

    Google Scholar 

  17. Ramanna, S., Peters, J.F., Sengoz, C.: Application of tolerance rough sets in structured and unstructured text categorization: a survey. In: Wang, G., Skowron, A., Yao, Y., Ślęzak, D., Polkowski, L. (eds.) Thriving Rough Sets. SCI, vol. 708, pp. 119–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54966-8_6

    Chapter  Google Scholar 

  18. Sengoz, C.: A granular-based approach for semi-supervised web information labeling. Master’s thesis, University of Winnipeg (2014). Supervisor: S. Ramanna

    Google Scholar 

  19. Sengoz, C., Ramanna, S.: A semi-supervised learning algorithm for web information extraction with tolerance rough sets. In: Ślȩzak, D., Schaefer, G., Vuong, S.T., Kim, Y.-S. (eds.) AMT 2014. LNCS, vol. 8610, pp. 1–10. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09912-5_1

    Chapter  Google Scholar 

  20. Sengoz, C., Ramanna, S.: Learning relational facts from the web: a tolerance rough set approach. Pattern Recogn. Lett. 67(P2), 130–137 (2015)

    Article  Google Scholar 

  21. Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fund. Inform. 27(2–3), 245–253 (1996)

    MathSciNet  MATH  Google Scholar 

  22. Sørensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biologiske skrifter, I kommission hos E. Munksgaard (1948). http://books.google.co.in/books?id=rpS8GAAACAAJ

  23. Swieboda, W., Krasuski, A., Nguyen, H.S., Janusz, A.: Interactive method for semantic document indexing based on explicit semantic analysis. Fund. Inform. 132(3), 423–438 (2014). https://doi.org/10.3233/FI-2014-1052

    Article  Google Scholar 

  24. Świeboda, W., Meina, M., Nguyen, H.S.: Weight learning for document tolerance rough set model. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) RSKT 2013. LNCS (LNAI), vol. 8171, pp. 385–396. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41299-8_37

    Chapter  Google Scholar 

  25. Virginia, G., Nguyen, H.S.: Lexicon-based document representation. Fundamenta Informatica 124(1–2), 27–46 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  26. Virginia, G., Nguyen, H.S.: A semantic text retrieval for Indonesian using tolerance rough sets models. In: Peters, J.F., Skowron, A., Ślȩzak, D., Nguyen, H.S., Bazan, J.G. (eds.) Transactions on Rough Sets XIX. LNCS, vol. 8988, pp. 138–224. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47815-8_9

    Chapter  Google Scholar 

  27. Wang, X., Song, X., Li, B., Guan, Y., Han, J.: Comprehensive named entity recognition on CORD-19 with distant or weak supervision. arXiv preprint arXiv:2003.12218 (2020)

Download references

Acknowledgments

Seeratpal’ s work was supported by University of Winnipeg 2020 and 2021 NSERC Undergraduate Research Award (USRA). Sheela Ramanna’ s was supported by NSERC Discovery Grant # 194376. The authors wish to acknowledge the help of Rajesh Jaiswal for preprocessing the dataset and Christopher Henry for providing the GPU computing platform.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheela Ramanna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jaura, S., Ramanna, S. (2022). Named Entity Recognition on CORD-19 Bio-Medical Dataset with Tolerance Rough Sets. In: Peters, J.F., Skowron, A., Bhaumik, R.N., Ramanna, S. (eds) Transactions on Rough Sets XXIII. Lecture Notes in Computer Science(), vol 13610. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-66544-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-66544-2_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-66543-5

  • Online ISBN: 978-3-662-66544-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics