Abstract
The ever-growing literature in biomedicine makes it virtually impossible for individuals to grasp all the information relevant to their interests. Since even experts’ knowledge is limited, important associations among key biomedical concepts may remain unnoticed in the flood of information. Discovering those hidden associations is called hypothesis discovery or literature-based discovery. This paper propose an approach to this problem taking advantage of a closed, triangular chain of relations extracted from the existing literature. We consider such chains of relations as implicit rules to generate explanatory hypotheses. The hypotheses generated from the implicit rules are then compared with newer knowledge for assessing their validity and, if validated, they are served as positive examples for learning a regression model to rank hypotheses. As a proof of concept, the proposed framework is empirically evaluated on real-world knowledge extracted from the biomedical literature. The results demonstrate that the framework is able to produce legitimate hypotheses and that the proposed ranking approach is more effective than the previous work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
In fact, three relations, “actinomycin D inhibits mRNA”, “mRNA directs protein synthesis”, and “actinomycin D impairs protein synthesis”, were extracted from Medline, and this rule was acquired without manually coding any domain knowledge.
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A., et al.: Fast discovery of association rules. Adv. Knowl. Discov. Data Min. 12, 307–328 (1996)
Ananiadou, S., Kell, D.B., Tsujii, J.: Text mining and its potential applications in systems biology. Trends Biotechnol. 24(12), 571–579 (2006)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program. In: Proceedings of American Medical Informatics 2001 Annual Symposium, pp. 17–21 (2001)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Berant, J., Dagan, I., Adler, M., Goldberger, J.: Efficient tree-based approximation for entailment graph learning. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 117–125 (2012)
Björne, J., Ginter, F., Pyysalo, S., Tsujii, J., Salakoski, T.: Complex event extraction at PubMed scale. Bioinformatics 26(12), i382–i390 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., Sheth, A.P., Rindflesch, T.C.: A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. J. Biomed. Inf. 46(2), 238–251 (2013)
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 370–383 (2007)
Cohen, T., Widdows, D., Schvaneveldt, R.W., Davies, P., Rindflesch, T.C.: Discovering discovery patterns with predication-based semantic indexing. J. Biomed. Inf. 45(6), 1049–1065 (2012)
Digiacomo, R.A., Kremer, J.M., Shah, D.M.: Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am. J. Med. 86(2), 158–164 (1989)
Do, Q.X., Chan, Y.S., Roth, D.: Minimally supervised event causality identification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 294–303 (2011)
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220 (2008)
Fellbaum, C.D.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)
Hashimoto, C., Torisawa, K., De Saeger, S., Oh, J.H., Kazama, J.: Excitatory or inhibitory: a new semantic orientation extracts contradiction and causality from the Web. In: Proceedings of the 2012 Joint Conference on EMNLP/CoNLL, pp. 619–630 (2012)
Hersh, W., Bhuptiraju, R.T., Ross, L., Cohen, A.M., Kraemer, D.F.: TREC 2004 genomics track overview. In: Proceedings of the 13th Text REtrieval Conference (TREC) (2004)
Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: Proceedings of American Medical Informatics 2006 Annual Symposium, pp. 349–353 (2006)
Hristovski, D., Peterlin, B., Mitchell, J.A., Humphreyb, S.M.: Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inf. 74, 289–298 (2005)
Kostoff, R.N., Block, J.A., Solka, J.L., Briggs, M.B., Rushenberg, R.L., Stump, J.A., Johnson, D., Lyons, T.J., Wyatt, J.R.: Literature-related discovery. Ann. Rev. Inf. Sci. Technol. 43(1), 1–71 (2009)
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the 20th International Conference on Machine Learning (2003)
Lu, Z., Wilbur, W.J.: Improving accuracy for identifying related PubMed queries by an integrated approach. J. Biomed. Inf. 42(5), 831–838 (2009)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 775–780 (2006)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013), pp. 746–751 (2013)
Norton, J.D.: A Little Survey of Induction. In: Achinstein, P. (ed.) Scientific Evidence: Philosophical Theories and Applications, pp. 9–34. Johns Hopkins University Press, Baltimore (2003)
Pratt, W., Yetisgen-Yildiz, M.: Litlinker: capturing connections across the biomedical literature. In: Proceedings of the 2nd international conference on Knowledge capture, pp. 105–112 (2003)
Rindflesh, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inf. 36(6), 462–477 (2003)
Schoenmackers, S., Etzioni, O., Weld, D.S., Davis, J.: Learning first-order Horn clauses from web text. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1088–1098 (2010)
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Jones, K.S.: Statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–20 (1972)
Srinivasan, P.: Text mining: generating hypotheses from Medline. J. Am. Soc. Inf. Sci. Technol. 55(5), 396–413 (2004)
Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7–18 (1986)
Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. J. Am. Soc. Inf. Sci. 38(4), 228–233 (1987)
Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31(4), 526–557 (1988)
Swanson, D.R.: Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect. Biol. Med. 33(2), 157–179 (1990)
Swanson, D.R., Smalheiser, N.R., Torvik, V.I.: Ranking indirect connections in literature-based discovery: the role of medical subject headings. J. Am. Soc. Inf. Sci. Technol. 57(11), 1427–1439 (2006)
Szpektor, I., Dagan, I.: Learning entailment rules for unary templates. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 849–856 (2008)
Tsuruoka, Y., Tsujii, J.: Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of HLT/EMNLP 2005, pp. 467–474 (2005)
Weeber, M., Klein, H., Jong-van den Berg, L.T.W., Vos, R.: Using concepts in literature-based discovery: simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. J. Am. Soc. Inf. Sci. Technol. 52(7), 548–557 (2001)
Xiao, Y., Liu, B., Yin, J., Cao, L., Zhang, C., Hao, Z.: Similarity-based approach for positive and unlabelled learning. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1577–1582 (2011)
Acknowledgements
This work is partially supported by JSPS KAKENHI Grant Numbers 25330363 and MEXT, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Seki, K. (2015). Hypothesis Discovery Exploiting Closed Chains of Relations. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXII. Lecture Notes in Computer Science(), vol 9430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48567-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-48567-5_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48566-8
Online ISBN: 978-3-662-48567-5
eBook Packages: Computer ScienceComputer Science (R0)