Mining strong relevance between heterogeneous entities from unstructured biomedical data

Ji, Ming; He, Qi; Han, Jiawei; Spangler, Scott

doi:10.1007/s10618-014-0396-4

Mining strong relevance between heterogeneous entities from unstructured biomedical data

Published: 05 February 2015

Volume 29, pages 976–998, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Ming Ji¹,
Qi He²,
Jiawei Han¹ &
…
Scott Spangler³

993 Accesses
11 Citations
Explore all metrics

Abstract

Huge volumes of biomedical text data discussing about different biomedical entities are being generated every day. Hidden in those unstructured data are the strong relevance relationships between those entities, which are critical for many interesting applications including building knowledge bases for the biomedical domain and semantic search among biomedical entities. In this paper, we study the problem of discovering strong relevance between heterogeneous typed biomedical entities from massive biomedical text data. We first build an entity correlation graph from data, in which the collection of paths linking two heterogeneous entities offer rich semantic contexts for their relationships, especially those paths following the patterns of top-\(k\) selected meta paths inferred from data. Guided by such meta paths, we design a novel relevance measure to compute the strong relevance between two heterogeneous entities, named \({\mathsf {EntityRel}}\). Our intuition is, two entities of heterogeneous types are strongly relevant if they have strong direct links or they are linked closely to other strongly relevant heterogeneous entities along paths following the selected patterns. We provide experimental results on mining strong relevance between drugs and diseases. More than 20 millions of MEDLINE abstracts and 5 types of biological entities (Drug, Disease, Compound, Target, MeSH) are used to construct the entity correlation graph. A prototype of drug search engine for disease queries is implemented. Extensive comparisons are made against multiple state-of-the-arts in the examples of Drug–Disease relevance discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking

Article Open access 29 October 2019

Discovering relations between indirectly connected biomedical concepts

Article Open access 06 July 2015

Discovering Relations between Indirectly Connected Biomedical Concepts

Notes

http://www.nlm.nih.gov/bsd/pmresources.html.
http://www.accessdata.fda.gov/scripts/cder/drugsatfda/.
http://www.obofoundry.org/cgi-bin/detail.cgi?id=disease_ontology.
http://www.ebi.ac.uk/chebi/. Note that drugs belong to compounds. In this paper, we treat them differently as they originate from different sources orthogonally.
http://www.nlm.nih.gov/mesh/.
https://www.ebi.ac.uk/chembl/.
http://www.accessdata.fda.gov/scripts/cder/ob/default.cfm. Among all the relevance relationships between different types of biological entities, we show the discovery results of the therapeutic relationships as an example since the results are easy to be evaluated by referring to FDA’s orange book.
The hit disease “acne vulgaris” is its synonym.

References

Aleman-Meza B, Halaschek-Wiener C, Arpinar IB, Sheth AP (2003) Context-aware semantic association ranking. In: Semantic Web and Databases, pp. 33–50
Anyanwu K, Maduko A, Sheth AP (2005) Semrank: ranking complex relationship search results on the semantic web. In: WWW, pp. 117–127
Anyanwu K, Sheth AP (2003) P-queries: enabling querying for semantic associations on the semantic web. In: WWW, pp. 690–699
Coulet A, Garten Y, Dumontier M, Altman R, Musen M, Shah N (2011) Integration and publication of heterogeneous text-mined relationships on the semantic web. J Biomed Semant 2(Suppl 2):S10
Eppstein D (1998) Finding the k shortest paths. SIAM J Comput 28(2):652–673
Article MATH MathSciNet Google Scholar
Guan Z, Wang C, Bu J, Chen C, Yang K, Cai D, He X (2010) Document recommendation in social tagging services. In: WWW, pp. 391–400
Gunther E, Stone D, Gerwien R, Bento P, Heyes M (2003) Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro. Proc Natl Acad Sci 100(16):9608
Article Google Scholar
Jeh G, Widom J (2002) Simrank: a measure of structural-context similarity. In: KDD, pp. 538–543
Jeh G, Widom J (2003) Scaling personalized web search. In: WWW, pp. 271–279
Lao N, Cohen WW (2004) Relational retrieval using a combination of path-constrained random walks. Mach Learn 81:53–67
Article MathSciNet Google Scholar
Lao N, Cohen WW (2010) Fast query execution for retrieval models based on path-constrained random walks. In: KDD, pp. 881–888
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book MATH Google Scholar
Ramakrishnan C, Mendes P, Wang S, Sheth A (2008) Unsupervised discovery of compound entities for relationship extraction. Knowledge Engineering: Practice and Patterns pp. 146–155
Searls D (2005) Data integration: challenges for drug discovery. Nat Rev Drug Discov 4(1):45–58
Article Google Scholar
Sen S, Vig J, Riedl J (2009) Tagommenders: connecting users to items through tags. In: WWW, pp. 671–680
Sheth AP, Aleman-Meza B, Arpinar IB, Bertram C, Warke YS, Ramakrishnan C, Halaschek C, Anyanwu K, Avant D, Arpinar FS, Kochut K (2005) Semantic association identification and knowledge discovery for national security applications. J Database Manage 16(1):33–53
Article Google Scholar
Shi C, Kong X, Yu PS, Xie S, Wu B (2012) Relevance search in heterogeneous networks. In: EDBT, pp. 180–191
Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. PVLDB 4(11):992–1003
Google Scholar
Yan S, Spangler WS, Chen Y (2011) Cross media entity extraction and linkage for chemical documents. In: AAAI
Yin D, Xue Z, Hong L, Davison B (2010) A probabilistic model for personalized tag prediction. In: KDD, pp. 959–968

Download references

Acknowledgments

Research was sponsored in part by the Army Research Lab, under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA), National Science Foundation IIS-1017362, IIS-1320617, IIS-1354329, HDTRA1-10-1-0120, and NIH Big Data to Knowledge (BD2K) (U54).

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, Urbana, IL, USA
Ming Ji & Jiawei Han
LinkedIn Inc., Mountain View, CA, USA
Qi He
IBM Almaden Research Center, 650 Harry Road, San Jose, CA, USA
Scott Spangler

Authors

Ming Ji
View author publications
You can also search for this author in PubMed Google Scholar
Qi He
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar
Scott Spangler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Ji.

Additional information

Responsible editor: Fei Wang, Gregor Stiglic, Ian Davidson, Zoran Obradovic.

This work was done when the first author was doing an internship at IBM Almaden Research Center.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ji, M., He, Q., Han, J. et al. Mining strong relevance between heterogeneous entities from unstructured biomedical data. Data Min Knowl Disc 29, 976–998 (2015). https://doi.org/10.1007/s10618-014-0396-4

Download citation

Received: 02 March 2014
Accepted: 17 November 2014
Published: 05 February 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s10618-014-0396-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining strong relevance between heterogeneous entities from unstructured biomedical data

Abstract

Access this article

Similar content being viewed by others

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking

Discovering relations between indirectly connected biomedical concepts

Discovering Relations between Indirectly Connected Biomedical Concepts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining strong relevance between heterogeneous entities from unstructured biomedical data

Abstract

Access this article

Similar content being viewed by others

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking

Discovering relations between indirectly connected biomedical concepts

Discovering Relations between Indirectly Connected Biomedical Concepts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation