Abstract
The volume of the biological literature has been increasing fast, which leads to a rapid growth of biological pathway figures included in the related biological papers. Each pathway figure encompasses rich biological information, consisting of gene names and gene relations. However, manual curations for pathway figures require tremendous time and labor. While leveraging advanced image understanding models may accelerate the process of curations, the accuracy of these models still needs improvements. Since each pathway figure is associated with a paper, most of the gene names and gene relations in a pathway figure also appear in the related paper text, where we can utilize text mining to improve the image recognition results. In this paper, we applied a fuzzy match method to detect gene names with different “gene dictionaries,” as well as gene co-occurrence in the plain text for suggesting gene relations. We have demonstrated that the performance of image understanding for both gene name recognitions and gene relation extractions can be improved with the help of text mining methods. All the data and code are available at GitHub (https://github.com/lyfer233/Text-Mining-Enhancements-for-Image-Recognition-of-Gene-Names-and-Gene-Relations).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, F., et al.: Extracting molecular entities and their interactions from pathway figures based on deep learning. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, pp. 397–404. Association for Computing Machinery (2019)
Hanspers, K., et al.: Pathway information extracted from 25 years of pathway figures. Genome Biol. 21(1), 273 (2020)
Kanehisa, M., et al.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2016)
Wei, C.-H., et al.: PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 47(W1), W587–W593 (2019)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)
Kim, D., et al.: A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019)
Kim, M., Baek, S.H., Song, M.: Relation extraction for biological pathway construction using node2vec. BMC Bioinform. 19(8), 206 (2018)
Zhou, J., Fu, B.-Q.: The research on gene-disease association based on text-mining of PubMed. BMC Bioinform. 19(1), 37 (2018)
Braschi, B., et al.: Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res. 47(D1), D786–D792 (2018)
Tafti, A.P., Baghaie, A., Assefi, M., Arabnia, H.R., Yu, Z., Peissig, P.: OCR as a service: an experimental evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. In: Bebis, G., et al. (eds.) ISVC 2016. LNCS, vol. 10072, pp. 735–746. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50835-1_66
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 10, 707–710 (1965)
Kato, H., Katoh, R., Kitamura, M.: Dual regulation of cadmium-induced apoptosis by mTORC1 through selective induction of IRE1 branches in unfolded protein response. PLoS ONE 8(5), e64344–e64344 (2013)
Yu, Q., et al.: Fibronectin promotes the malignancy of glioma stem-like cells via modulation of cell adhesion, differentiation, proliferation and chemoresistance. Front. Mol. Neurosci. 11, 130 (2018)
Acknowledgements
The authors would like to express their gratitude to Clement Essien, Drs. Richard Hammer and Dmitriy Shin for helpful discussions. The research is supported by the National Library of Medicine of the National Institute of Health (NIH) award 5R01LM013392.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ren, Y. et al. (2022). Text Mining Enhancements for Image Recognition of Gene Names and Gene Relations. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-20837-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20836-2
Online ISBN: 978-3-031-20837-9
eBook Packages: Computer ScienceComputer Science (R0)