Skip to main content

Hypothesis Discovery Exploiting Closed Chains of Relations

  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 9430))

Abstract

The ever-growing literature in biomedicine makes it virtually impossible for individuals to grasp all the information relevant to their interests. Since even experts’ knowledge is limited, important associations among key biomedical concepts may remain unnoticed in the flood of information. Discovering those hidden associations is called hypothesis discovery or literature-based discovery. This paper propose an approach to this problem taking advantage of a closed, triangular chain of relations extracted from the existing literature. We consider such chains of relations as implicit rules to generate explanatory hypotheses. The hypotheses generated from the implicit rules are then compared with newer knowledge for assessing their validity and, if validated, they are served as positive examples for learning a regression model to rank hypotheses. As a proof of concept, the proposed framework is empirically evaluated on real-world knowledge extracted from the biomedical literature. The results demonstrate that the framework is able to produce legitimate hypotheses and that the proposed ranking approach is more effective than the previous work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.ncbi.nlm.nih.gov/entrez.

  2. 2.

    http://www.nlm.nih.gov/research/umls/.

  3. 3.

    http://www.nlm.nih.gov/mesh/.

  4. 4.

    http://semanticnetwork.nlm.nih.gov/.

  5. 5.

    In fact, three relations, “actinomycin D inhibits mRNA”, “mRNA directs protein synthesis”, and “actinomycin D impairs protein synthesis”, were extracted from Medline, and this rule was acquired without manually coding any domain knowledge.

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A., et al.: Fast discovery of association rules. Adv. Knowl. Discov. Data Min. 12, 307–328 (1996)

    Google Scholar 

  2. Ananiadou, S., Kell, D.B., Tsujii, J.: Text mining and its potential applications in systems biology. Trends Biotechnol. 24(12), 571–579 (2006)

    Article  Google Scholar 

  3. Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program. In: Proceedings of American Medical Informatics 2001 Annual Symposium, pp. 17–21 (2001)

    Google Scholar 

  4. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Berant, J., Dagan, I., Adler, M., Goldberger, J.: Efficient tree-based approximation for entailment graph learning. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 117–125 (2012)

    Google Scholar 

  6. Björne, J., Ginter, F., Pyysalo, S., Tsujii, J., Salakoski, T.: Complex event extraction at PubMed scale. Bioinformatics 26(12), i382–i390 (2010)

    Article  Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., Sheth, A.P., Rindflesch, T.C.: A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. J. Biomed. Inf. 46(2), 238–251 (2013)

    Article  Google Scholar 

  9. Cilibrasi, R.L., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 370–383 (2007)

    Article  Google Scholar 

  10. Cohen, T., Widdows, D., Schvaneveldt, R.W., Davies, P., Rindflesch, T.C.: Discovering discovery patterns with predication-based semantic indexing. J. Biomed. Inf. 45(6), 1049–1065 (2012)

    Article  Google Scholar 

  11. Digiacomo, R.A., Kremer, J.M., Shah, D.M.: Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: a double-blind, controlled, prospective study. Am. J. Med. 86(2), 158–164 (1989)

    Article  Google Scholar 

  12. Do, Q.X., Chan, Y.S., Roth, D.: Minimally supervised event causality identification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 294–303 (2011)

    Google Scholar 

  13. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220 (2008)

    Google Scholar 

  14. Fellbaum, C.D.: WordNet: an electronic lexical database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  15. Hashimoto, C., Torisawa, K., De Saeger, S., Oh, J.H., Kazama, J.: Excitatory or inhibitory: a new semantic orientation extracts contradiction and causality from the Web. In: Proceedings of the 2012 Joint Conference on EMNLP/CoNLL, pp. 619–630 (2012)

    Google Scholar 

  16. Hersh, W., Bhuptiraju, R.T., Ross, L., Cohen, A.M., Kraemer, D.F.: TREC 2004 genomics track overview. In: Proceedings of the 13th Text REtrieval Conference (TREC) (2004)

    Google Scholar 

  17. Hristovski, D., Friedman, C., Rindflesch, T.C., Peterlin, B.: Exploiting semantic relations for literature-based discovery. In: Proceedings of American Medical Informatics 2006 Annual Symposium, pp. 349–353 (2006)

    Google Scholar 

  18. Hristovski, D., Peterlin, B., Mitchell, J.A., Humphreyb, S.M.: Using literature-based discovery to identify disease candidate genes. Int. J. Med. Inf. 74, 289–298 (2005)

    Article  Google Scholar 

  19. Kostoff, R.N., Block, J.A., Solka, J.L., Briggs, M.B., Rushenberg, R.L., Stump, J.A., Johnson, D., Lyons, T.J., Wyatt, J.R.: Literature-related discovery. Ann. Rev. Inf. Sci. Technol. 43(1), 1–71 (2009)

    Article  Google Scholar 

  20. Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. In: Proceedings of the 20th International Conference on Machine Learning (2003)

    Google Scholar 

  21. Lu, Z., Wilbur, W.J.: Improving accuracy for identifying related PubMed queries by an integrated approach. J. Biomed. Inf. 42(5), 831–838 (2009)

    Article  Google Scholar 

  22. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 775–780 (2006)

    Google Scholar 

  23. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013), pp. 746–751 (2013)

    Google Scholar 

  24. Norton, J.D.: A Little Survey of Induction. In: Achinstein, P. (ed.) Scientific Evidence: Philosophical Theories and Applications, pp. 9–34. Johns Hopkins University Press, Baltimore (2003)

    Google Scholar 

  25. Pratt, W., Yetisgen-Yildiz, M.: Litlinker: capturing connections across the biomedical literature. In: Proceedings of the 2nd international conference on Knowledge capture, pp. 105–112 (2003)

    Google Scholar 

  26. Rindflesh, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inf. 36(6), 462–477 (2003)

    Article  Google Scholar 

  27. Schoenmackers, S., Etzioni, O., Weld, D.S., Davis, J.: Learning first-order Horn clauses from web text. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1088–1098 (2010)

    Google Scholar 

  28. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)

    Google Scholar 

  29. Jones, K.S.: Statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–20 (1972)

    Article  Google Scholar 

  30. Srinivasan, P.: Text mining: generating hypotheses from Medline. J. Am. Soc. Inf. Sci. Technol. 55(5), 396–413 (2004)

    Article  Google Scholar 

  31. Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7–18 (1986)

    Article  Google Scholar 

  32. Swanson, D.R.: Two medical literatures that are logically but not bibliographically connected. J. Am. Soc. Inf. Sci. 38(4), 228–233 (1987)

    Article  Google Scholar 

  33. Swanson, D.R.: Migraine and magnesium: eleven neglected connections. Perspect. Biol. Med. 31(4), 526–557 (1988)

    Article  Google Scholar 

  34. Swanson, D.R.: Somatomedin C and arginine: implicit connections between mutually isolated literatures. Perspect. Biol. Med. 33(2), 157–179 (1990)

    Article  Google Scholar 

  35. Swanson, D.R., Smalheiser, N.R., Torvik, V.I.: Ranking indirect connections in literature-based discovery: the role of medical subject headings. J. Am. Soc. Inf. Sci. Technol. 57(11), 1427–1439 (2006)

    Article  Google Scholar 

  36. Szpektor, I., Dagan, I.: Learning entailment rules for unary templates. In: Proceedings of the 22nd International Conference on Computational Linguistics, pp. 849–856 (2008)

    Google Scholar 

  37. Tsuruoka, Y., Tsujii, J.: Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proceedings of HLT/EMNLP 2005, pp. 467–474 (2005)

    Google Scholar 

  38. Weeber, M., Klein, H., Jong-van den Berg, L.T.W., Vos, R.: Using concepts in literature-based discovery: simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. J. Am. Soc. Inf. Sci. Technol. 52(7), 548–557 (2001)

    Article  Google Scholar 

  39. Xiao, Y., Liu, B., Yin, J., Cao, L., Zhang, C., Hao, Z.: Similarity-based approach for positive and unlabelled learning. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1577–1582 (2011)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by JSPS KAKENHI Grant Numbers 25330363 and MEXT, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuhiro Seki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Seki, K. (2015). Hypothesis Discovery Exploiting Closed Chains of Relations. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXII. Lecture Notes in Computer Science(), vol 9430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48567-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48567-5_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48566-8

  • Online ISBN: 978-3-662-48567-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics