Abstract
Automated construction and annotation of biological networks is becoming increasingly important in bioinformatics as the amount of biological data increases. At the center of this are metrics required for relating biological entities such as genes, diseases, signaling molecules and chemical compounds. Co-occurrence of terms within abstracts is widely used to establish tentative relationships because it is easy to use, implement, understand, and is reasonably accurate. However, it is also very imprecise – the cutoffs for how many co-occurrences of terms are necessary to establish a relationship is arbitrary and the nature of the relationship is generic. Since the frequency of co-occurrence for terms usually follows a scale-free distribution, this property can be exploited to define degree of membership in fuzzy sets. Beginning with a set of co-occurrences for any biomedical term (or its synonyms), relations are defined by the overlap of sets, normalizing by the area under the curve that the two sets share. The ability of this method to rank the relative specificity of biological relationships is tested by comparing cumulative term co-occurrences within 7.5 million MEDLINE abstracts with focused summaries of gene function and disease association within LocusLink. On average, the fuzzy set ranking (FSR) was in the top 0.6% of all potential associations, showing a good correlation between domain overlap and the biological association between two terms.
Similar content being viewed by others
References
ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL_tmpl.gz.
Allegrini P, Grigolini P et al (2004) Intermittency and scale-free networks: a dynamical model for human language complexity. Chaos Solitons Fractals 20(1):95–105
Bader GD, Donaldson I et al (2001) BIND–The biomolecular interaction network database. Nucleic Acids Res 29(1):242–245
Blaschke C, Andrade MA et al (1999) Automatic extraction of biological information from scientific text: protein-protein interactions. ISMB 99:60–67
Church KW, Hanks P (1990) Word association norms, mutual information and lexicography. Computat Linguist 16:22–29
DiGiacomo RA, Kremer JM et al (1989) Fish-oil dietary supplementation in patients with Raynaud's phenomenon: a double-blind, controlled, prospective study. Am J Med 86(2):158–164
Hamosh A, Scott AF et al (2002) Online Mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 30(1):52–55
Harris MA, Clark J et al (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32 Database issue:D258–D261
Jenssen TK, Laegreid A et al (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28
Klir G, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice Hall. Englewood Cliffs
Lowe HJ, Barnett GO (1994) Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Jama 271(14):1103–1108
Pruitt KD, Maglott DR (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 29(1):137–140
Rindflesch TC, Hunter L et al (1999) Mining molecular binding terminology from biomedical text. Proc AMIA Symp :127–131
Rindflesch TC, Tanabe L et al (2000) EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac Symp Biocomput : 517–528
Shannon C, Weaver E (1949) The mathematical theory of communication. University of Illinois Press, Chicago and Urbana
Smalheiser NR, Swanson DR (1998) Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed 57(3):149–153
Stapley BJ, Benoit G (2000) Biobibliometrics: information retrieval and visualization from co- occurrences of gene names in Medline abstracts. Pac Symp Biocomput 5:529–540
Swanson DR (1986) Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med 30(1):7–18
Wallace DJ, Wallace BW (2002) All About Fibromyalgia. Oxford University Press, New York
Wren JD, Bekeredjian R et al (2004) Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20(3):389–398
Wren JD, Garner HR (2002) Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries. Methods Inf Med 41(5):426–434
Wren JD, Garner HR (2004) Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 20(2):191–198
Xenarios I, Rice DW et al (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28(1):289–291
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wren, J. Using fuzzy set theory and scale-free network properties to relate MEDLINE terms. Soft Comput 10, 374–381 (2006). https://doi.org/10.1007/s00500-005-0497-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-005-0497-5