Skip to main content

Text Mining Enhancements for Image Recognition of Gene Names and Gene Relations

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2021)

Abstract

The volume of the biological literature has been increasing fast, which leads to a rapid growth of biological pathway figures included in the related biological papers. Each pathway figure encompasses rich biological information, consisting of gene names and gene relations. However, manual curations for pathway figures require tremendous time and labor. While leveraging advanced image understanding models may accelerate the process of curations, the accuracy of these models still needs improvements. Since each pathway figure is associated with a paper, most of the gene names and gene relations in a pathway figure also appear in the related paper text, where we can utilize text mining to improve the image recognition results. In this paper, we applied a fuzzy match method to detect gene names with different “gene dictionaries,” as well as gene co-occurrence in the plain text for suggesting gene relations. We have demonstrated that the performance of image understanding for both gene name recognitions and gene relation extractions can be improved with the help of text mining methods. All the data and code are available at GitHub (https://github.com/lyfer233/Text-Mining-Enhancements-for-Image-Recognition-of-Gene-Names-and-Gene-Relations).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. He, F., et al.: Extracting molecular entities and their interactions from pathway figures based on deep learning. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, pp. 397–404. Association for Computing Machinery (2019)

    Google Scholar 

  2. Hanspers, K., et al.: Pathway information extracted from 25 years of pathway figures. Genome Biol. 21(1), 273 (2020)

    Article  PubMed  PubMed Central  Google Scholar 

  3. Kanehisa, M., et al.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  4. Wei, C.-H., et al.: PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 47(W1), W587–W593 (2019)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)

    PubMed Central  Google Scholar 

  6. Kim, D., et al.: A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019)

    Article  Google Scholar 

  7. Kim, M., Baek, S.H., Song, M.: Relation extraction for biological pathway construction using node2vec. BMC Bioinform. 19(8), 206 (2018)

    Article  Google Scholar 

  8. Zhou, J., Fu, B.-Q.: The research on gene-disease association based on text-mining of PubMed. BMC Bioinform. 19(1), 37 (2018)

    Article  Google Scholar 

  9. Braschi, B., et al.: Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47(D1), D786–D792 (2018)

    Google Scholar 

  10. Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66

  11. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 10, 707–710 (1965)

    Google Scholar 

  12. Kato, H., Katoh, R., Kitamura, M.: Dual regulation of cadmium-induced apoptosis by mTORC1 through selective induction of IRE1 branches in unfolded protein response. PLoS ONE 8(5), e64344–e64344 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Yu, Q., et al.: Fibronectin promotes the malignancy of glioma stem-like cells via modulation of cell adhesion, differentiation, proliferation and chemoresistance. Front. Mol. Neurosci. 11, 130 (2018)

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude to Clement Essien, Drs. Richard Hammer and Dmitriy Shin for helpful discussions. The research is supported by the National Library of Medicine of the National Institute of Health (NIH) award 5R01LM013392.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ren, Y. et al. (2022). Text Mining Enhancements for Image Recognition of Gene Names and Gene Relations. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20837-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20836-2

  • Online ISBN: 978-3-031-20837-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics