Hypothesis Discovery Exploiting Closed Chains of Relations

Seki, Kazuhiro

doi:10.1007/978-3-662-48567-5_5

Hypothesis Discovery Exploiting Closed Chains of Relations

Kazuhiro Seki¹⁶

Chapter
First Online: 08 November 2015

438 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9430))

Abstract

The ever-growing literature in biomedicine makes it virtually impossible for individuals to grasp all the information relevant to their interests. Since even experts’ knowledge is limited, important associations among key biomedical concepts may remain unnoticed in the flood of information. Discovering those hidden associations is called hypothesis discovery or literature-based discovery. This paper propose an approach to this problem taking advantage of a closed, triangular chain of relations extracted from the existing literature. We consider such chains of relations as implicit rules to generate explanatory hypotheses. The hypotheses generated from the implicit rules are then compared with newer knowledge for assessing their validity and, if validated, they are served as positive examples for learning a regression model to rank hypotheses. As a proof of concept, the proposed framework is empirically evaluated on real-world knowledge extracted from the biomedical literature. The results demonstrate that the framework is able to produce legitimate hypotheses and that the proposed ranking approach is more effective than the previous work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.ncbi.nlm.nih.gov/entrez.
2.
http://www.nlm.nih.gov/research/umls/.
3.
http://www.nlm.nih.gov/mesh/.
4.
http://semanticnetwork.nlm.nih.gov/.
5.
In fact, three relations, “actinomycin D inhibits mRNA”, “mRNA directs protein synthesis”, and “actinomycin D impairs protein synthesis”, were extracted from Medline, and this rule was acquired without manually coding any domain knowledge.

References

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A., et al.: Fast discovery of association rules. Adv. Knowl. Discov. Data Min. 12, 307–328 (1996)
Google Scholar
Ananiadou, S., Kell, D.B., Tsujii, J.: Text mining and its potential applications in systems biology. Trends Biotechnol. 24(12), 571–579 (2006)
Article Google Scholar
Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program. In: Proceedings of American Medical Informatics 2001 Annual Symposium, pp. 17–21 (2001)
Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Berant, J., Dagan, I., Adler, M., Goldberger, J.: Efficient tree-based approximation for entailment graph learning. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 117–125 (2012)
Google Scholar
Björne, J., Ginter, F., Pyysalo, S., Tsujii, J., Salakoski, T.: Complex event extraction at PubMed scale. Bioinformatics 26(12), i382–i390 (2010)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., Sheth, A.P., Rindflesch, T.C.: A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. J. Biomed. Inf. 46(2), 238–251 (2013)
Article Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 370–383 (2007)
Article Google Scholar
Cohen, T., Widdows, D., Schvaneveldt, R.W., Davies, P., Rindflesch, T.C.: Discovering discovery patterns with predication-based semantic indexing. J. Biomed. Inf. 45(6), 1049–1065 (2012)
Article Google Scholar
Digiacomo, R.A., Kremer, J.M., Shah, D.M.: Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am. J. Med. 86(2), 158–164 (1989)
Article Google Scholar
Do, Q.X., Chan, Y.S., Roth, D.: Minimally supervised event causality identification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 294–303 (2011)
Google Scholar
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220 (2008)
Google Scholar
Fellbaum, C.D.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)
MATH Google Scholar
Hashimoto, C., Torisawa, K., De Saeger, S., Oh, J.H., Kazama, J.: Excitatory or inhibitory: a new semantic orientation extracts contradiction and causality from the Web. In: Proceedings of the 2012 Joint Conference on EMNLP/CoNLL, pp. 619–630 (2012)
Google Scholar
Hersh, W., Bhuptiraju, R.T., Ross, L., Cohen, A.M., Kraemer, D.F.: TREC 2004 genomics track overview. In: Proceedings of the 13th Text REtrieval Conference (TREC) (2004)
Google Scholar
Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: Proceedings of American Medical Informatics 2006 Annual Symposium, pp. 349–353 (2006)
Google Scholar
Hristovski, D., Peterlin, B., Mitchell, J.A., Humphreyb, S.M.: Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inf. 74, 289–298 (2005)
Article Google Scholar
Kostoff, R.N., Block, J.A., Solka, J.L., Briggs, M.B., Rushenberg, R.L., Stump, J.A., Johnson, D., Lyons, T.J., Wyatt, J.R.: Literature-related discovery. Ann. Rev. Inf. Sci. Technol. 43(1), 1–71 (2009)
Article Google Scholar
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the 20th International Conference on Machine Learning (2003)
Google Scholar
Lu, Z., Wilbur, W.J.: Improving accuracy for identifying related PubMed queries by an integrated approach. J. Biomed. Inf. 42(5), 831–838 (2009)
Article Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 775–780 (2006)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013), pp. 746–751 (2013)
Google Scholar
Norton, J.D.: A Little Survey of Induction. In: Achinstein, P. (ed.) Scientific Evidence: Philosophical Theories and Applications, pp. 9–34. Johns Hopkins University Press, Baltimore (2003)
Google Scholar
Pratt, W., Yetisgen-Yildiz, M.: Litlinker: capturing connections across the biomedical literature. In: Proceedings of the 2nd international conference on Knowledge capture, pp. 105–112 (2003)
Google Scholar
Rindflesh, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inf. 36(6), 462–477 (2003)
Article Google Scholar
Schoenmackers, S., Etzioni, O., Weld, D.S., Davis, J.: Learning first-order Horn clauses from web text. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1088–1098 (2010)
Google Scholar
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Jones, K.S.: Statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–20 (1972)
Article Google Scholar
Srinivasan, P.: Text mining: generating hypotheses from Medline. J. Am. Soc. Inf. Sci. Technol. 55(5), 396–413 (2004)
Article Google Scholar
Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7–18 (1986)
Article Google Scholar
Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. J. Am. Soc. Inf. Sci. 38(4), 228–233 (1987)
Article Google Scholar
Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31(4), 526–557 (1988)
Article Google Scholar
Swanson, D.R.: Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect. Biol. Med. 33(2), 157–179 (1990)
Article Google Scholar
Swanson, D.R., Smalheiser, N.R., Torvik, V.I.: Ranking indirect connections in literature-based discovery: the role of medical subject headings. J. Am. Soc. Inf. Sci. Technol. 57(11), 1427–1439 (2006)
Article Google Scholar
Szpektor, I., Dagan, I.: Learning entailment rules for unary templates. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 849–856 (2008)
Google Scholar
Tsuruoka, Y., Tsujii, J.: Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of HLT/EMNLP 2005, pp. 467–474 (2005)
Google Scholar
Weeber, M., Klein, H., Jong-van den Berg, L.T.W., Vos, R.: Using concepts in literature-based discovery: simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. J. Am. Soc. Inf. Sci. Technol. 52(7), 548–557 (2001)
Article Google Scholar
Xiao, Y., Liu, B., Yin, J., Cao, L., Zhang, C., Hao, Z.: Similarity-based approach for positive and unlabelled learning. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1577–1582 (2011)
Google Scholar

Download references

Acknowledgements

This work is partially supported by JSPS KAKENHI Grant Numbers 25330363 and MEXT, Japan.

Author information

Authors and Affiliations

Konan University, Kobe, Hyogo, 658-8501, Japan
Kazuhiro Seki

Authors

Kazuhiro Seki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazuhiro Seki .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Seki, K. (2015). Hypothesis Discovery Exploiting Closed Chains of Relations. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXII. Lecture Notes in Computer Science(), vol 9430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48567-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-48567-5_5
Published: 08 November 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48566-8
Online ISBN: 978-3-662-48567-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics