skip to main content
10.1145/3608164.3608194acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbbtConference Proceedingsconference-collections
research-article

Hypothesis Generation from Literature for Advancing Biological Mechanism Research: A Perspective

Published: 07 November 2023 Publication History

Abstract

The number of biomedical literature being published has experienced significant growth, with a sharp increase in recent years. However, keeping pace with new knowledge in this complex and specialized field can be challenging. Hypothesis Generation is a literature-based discovery approach that utilizes existing literature to automatically generate implicit biomedical associations and provide reasonable predictions for future research. Despite its potential, current hypothesis generation methods face challenges when applied to research on biological mechanisms. In this perspective paper, we provide an overview of existing hypothesis generation approaches, and examine their limitations in the context of biological mechanism research. We propose practical solutions to overcome these challenges and highlight the potential of hypothesis generation in advancing our understanding of biological mechanisms.

References

[1]
Uchenna Akujuobi, Jun Chen, Mohamed Elhoseiny, Michael Spranger, and Xiangliang Zhang. 2020. Temporal positive-unlabeled learning for biomedical hypothesis generation via risk estimation. Advances in Neural Information Processing Systems 33 (2020), 4597–4609.
[2]
Uchenna Akujuobi, Michael Spranger, Sucheendra K Palaniappan, and Xiangliang Zhang. 2020. T-pair: Temporal node-pair embedding for automatic biomedical hypothesis generation. IEEE Transactions on Knowledge and Data Engineering 34, 6 (2020), 2988–3001.
[3]
Michael Bada, Miriam Eckert, Donald Evans, Kristin Garcia, Krista Shipley, Dmitry Sitnikov, William A Baumgartner, K Bretonnel Cohen, Karin Verspoor, Judith A Blake, 2012. Concept annotation in the CRAFT corpus. BMC bioinformatics 13, 1 (2012), 1–20.
[4]
Monya Baker. 2015. Irreproducible biology research costs put at $28 billion per year. Nature 533 (2015).
[5]
Thomas P Blackburn. 2019. Depressive disorders: Treatment failures and poor prognosis over the last 50 years. Pharmacology research and perspectives 7, 3 (2019), e00472.
[6]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[8]
Léo Brunot, Nicolas Canovas, Alexandre Chanson, Nicolas Labroche, and Willème Verdeaux. 2022. Preference-based and local post-hoc explanations for recommender systems. Information Systems 108 (2022), 102021.
[9]
Qingyu Chen, Alexis Allot, and Zhiyong Lu. 2020. Keep up with the latest coronavirus research. Nature 579, 7798 (2020), 193–194.
[10]
Allan Peter Davis, Cynthia J Grondin, Robin J Johnson, Daniela Sciaky, Jolene Wiegers, Thomas C Wiegers, and Carolyn J Mattingly. 2020. Comparative Toxicogenomics Database (CTD): update 2021. Nucleic Acids Research 49, D1 (2020), D1138–D1143.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.
[12]
Ralph A DiGiacomo, Joel M Kremer, and Dhiraj M Shah. 1989. Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. The American journal of medicine 86, 2 (1989), 158–164.
[13]
Rezarta Islamaj Doğan, Robert Leaman, and Zhiyong Lu. 2014. NCBI disease corpus: a resource for disease name recognition and concept normalization. Journal of biomedical informatics 47 (2014), 1–10.
[14]
Christoph H Emmerich, Lorena Martinez Gamboa, Martine CJ Hofmann, Marc Bonin-Andresen, Olga Arbach, Pascal Schendel, Björn Gerlach, Katja Hempel, Anton Bespalov, Ulrich Dirnagl, 2021. Improving target assessment in biomedical research: the GOT-IT recommendations. Nature reviews Drug discovery 20, 1 (2021), 64–81.
[15]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126–1135.
[16]
Lena K Hansson, Rasmus Borup Hansen, Sune Pletscher-Frankild, Rudolfs Berzins, Daniel Hvidberg Hansen, Dennis Madsen, Sten B Christensen, Malene Revsbech Christiansen, Ulrika Boulund, Xenia Asbæk Wolf, 2020. Semantic text mining in early drug discovery for type 2 diabetes. Plos one 15, 6 (2020), e0233956.
[17]
Wenjia He, Yi Jiang, Junru Jin, Zhongshen Li, Jiaojiao Zhao, Balachandran Manavalan, Ran Su, Xin Gao, and Leyi Wei. 2022. Accelerating bioactive peptide discovery via mutual information-based meta-learning. Briefings in Bioinformatics 23, 1 (2022), bbab499.
[18]
Daniel S Himmelstein and Sergio E Baranzini. 2015. Heterogeneous network edge prediction: a data integration approach to prioritize disease-associated genes. PLoS computational biology 11, 7 (2015), e1004259.
[19]
Kishlay Jha, Guangxu Xun, Yaqing Wang, and Aidong Zhang. 2019. Hypothesis generation from text based on co-evolution of biomedical concepts. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 843–851.
[20]
Rize Jin, Jinseon You, Jin-Woo Chung, Hee-Jin Lee, Maria Wolters, and Jong C Park. 2015. CoMAGD: Annotation of Gene-Depression Relations. In Proceedings of BioNLP 15. 104–113.
[21]
Minoru Kanehisa and Susumu Goto. 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 1 (2000), 27–30.
[22]
Halil Kilicoglu, Dongwook Shin, Marcelo Fiszman, Graciela Rosemblat, and Thomas C Rindflesch. 2012. SemMedDB: a -scale repository of biomedical semantic predications. Bioinformatics 28, 23 (2012), 3158–3160.
[23]
Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, 2019. PubChem 2019 update: improved access to chemical data. Nucleic acids research 47, D1 (2019), D1102–D1109.
[24]
Chaoxing Li, Li Liu, and Valentin Dinu. 2018. Pathways of topological rank analysis (PoTRA): a novel method to detect pathways involved in hepatocellular carcinoma. PeerJ 6 (2018), e4571.
[25]
Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016).
[26]
Alexa T McCray. 2003. An upper-level ontology for the biomedical domain. Comparative and functional genomics 4, 1 (2003), 80–84.
[27]
Ki Kwang Oh, Md Adnan, and Dong Ha Cho. 2021. Network pharmacology approach to decipher signaling pathways associated with target proteins of NSAIDs against COVID-19. Scientific reports 11, 1 (2021), 9606.
[28]
Georgina Peake and Jun Wang. 2018. Explanation mining: Post hoc interpretability of latent factor models for recommendation systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2060–2069.
[29]
Anna Ritz, Christopher L Poirel, Allison N Tegge, Nicholas Sharp, Kelsey Simmons, Allison Powell, Shiv D Kale, and TM Murali. 2016. Pathways on demand: automated reconstruction of human signaling networks. NPJ systems biology and applications 2, 1 (2016), 1–9.
[30]
Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T Vanni, Brian M Sadler, and Jiawei Han. 2018. Hiexpan: Task-guided taxonomy construction by hierarchical tree expansion. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2180–2189.
[31]
Don R Swanson. 1986. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in biology and medicine 30, 1 (1986), 7–18.
[32]
Justin Sybrandt, Michael Shtutman, and Ilya Safro. 2017. Moliere: Automatic biomedical hypothesis generation system. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1633–1642.
[33]
Justin Sybrandt, Ilya Tyagin, Michael Shtutman, and Ilya Safro. 2020. AGATHA: automatic graph mining and transformer based hypothesis generation approach. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2757–2764.
[34]
Zheng Wan, Bin Zhao, Xiaohong Zhang, and Yilin Zhao. 2020. Drug discovery in cardiovascular disease identified by text mining and data analysis. Ann. Palliat. Med 9 (2020), 3089–3099.
[35]
Haoyu Wang, Xuan Wang, Yaqing Wang, Guangxu Xun, Kishlay Jha, and Jing Gao. 2021. Interhg: an interpretable and accurate model for hypothesis generation. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 1552–1557.
[36]
Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Doug Burdick, Darrin Eide, Kathryn Funk, Yannis Katsis, Rodney Michael Kinney, 2020. CORD-19: The COVID-19 Open Research Dataset. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020.
[37]
David S Wishart, Yannick D Feunang, An C Guo, Elvis J Lo, Ana Marcu, Jason R Grant, Tanvir Sajed, Daniel Johnson, Carin Li, Zinat Sayeeda, 2018. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic acids research 46, D1 (2018), D1074–D1082.
[38]
Ronghui You, Zihan Zhang, Ziye Wang, Suyang Dai, Hiroshi Mamitsuka, and Shanfeng Zhu. 2019. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. Advances in Neural Information Processing Systems 32 (2019).
[39]
Rong-Guo Yu, Jia-Yu Zhang, Zhen-Tao Liu, You-Guang Zhuo, Hai-Yang Wang, Jie Ye, Nannan Liu, and Yi-Yuan Zhang. 2021. Text mining-based drug discovery in osteoarthritis. Journal of Healthcare Engineering 2021 (2021), 1–14.
[40]
Nanyang Zhang, Wenbing Xu, Shijie Wang, Yan Qiao, and Xiaoxiao Zhang. 2019. Computational drug discovery in chemotherapy-induced alopecia via text mining and biomedical databases. Clinical therapeutics 41, 5 (2019), 972–980.
[41]
Sendong Zhao, Chang Su, Zhiyong Lu, and Fei Wang. 2021. Recent advances in biomedical literature mining. Briefings in Bioinformatics 22, 3 (2021), bbaa057.
[42]
Yongjun Zhu, Woojin Jung, Fei Wang, and Chao Che. 2020. Drug repurposing against Parkinson’s disease by text mining the scientific literature. Library Hi Tech 38, 4 (2020), 741–750.

Index Terms

  1. Hypothesis Generation from Literature for Advancing Biological Mechanism Research: A Perspective

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICBBT '23: Proceedings of the 2023 15th International Conference on Bioinformatics and Biomedical Technology
      May 2023
      313 pages
      ISBN:9798400700385
      DOI:10.1145/3608164
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 November 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bioinformatics
      2. biomedical knowledge mining
      3. machine learning

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICBBT 2023

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 39
        Total Downloads
      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media