Skip to main content
Log in

Using fuzzy set theory and scale-free network properties to relate MEDLINE terms

  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Automated construction and annotation of biological networks is becoming increasingly important in bioinformatics as the amount of biological data increases. At the center of this are metrics required for relating biological entities such as genes, diseases, signaling molecules and chemical compounds. Co-occurrence of terms within abstracts is widely used to establish tentative relationships because it is easy to use, implement, understand, and is reasonably accurate. However, it is also very imprecise – the cutoffs for how many co-occurrences of terms are necessary to establish a relationship is arbitrary and the nature of the relationship is generic. Since the frequency of co-occurrence for terms usually follows a scale-free distribution, this property can be exploited to define degree of membership in fuzzy sets. Beginning with a set of co-occurrences for any biomedical term (or its synonyms), relations are defined by the overlap of sets, normalizing by the area under the curve that the two sets share. The ability of this method to rank the relative specificity of biological relationships is tested by comparing cumulative term co-occurrences within 7.5 million MEDLINE abstracts with focused summaries of gene function and disease association within LocusLink. On average, the fuzzy set ranking (FSR) was in the top 0.6% of all potential associations, showing a good correlation between domain overlap and the biological association between two terms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL_tmpl.gz.

  2. http://www.nlm.nih.gov/pubs/factsheets/medline.html.

  3. Allegrini P, Grigolini P et al (2004) Intermittency and scale-free networks: a dynamical model for human language complexity. Chaos Solitons Fractals 20(1):95–105

    Google Scholar 

  4. Bader GD, Donaldson I et al (2001) BIND–The biomolecular interaction network database. Nucleic Acids Res 29(1):242–245

    Google Scholar 

  5. Blaschke C, Andrade MA et al (1999) Automatic extraction of biological information from scientific text: protein-protein interactions. ISMB 99:60–67

    Google Scholar 

  6. Church KW, Hanks P (1990) Word association norms, mutual information and lexicography. Computat Linguist 16:22–29

    Google Scholar 

  7. DiGiacomo RA, Kremer JM et al (1989) Fish-oil dietary supplementation in patients with Raynaud's phenomenon: a double-blind, controlled, prospective study. Am J Med 86(2):158–164

    Google Scholar 

  8. Hamosh A, Scott AF et al (2002) Online Mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 30(1):52–55

    Google Scholar 

  9. Harris MA, Clark J et al (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32 Database issue:D258–D261

    Google Scholar 

  10. Jenssen TK, Laegreid A et al (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28(1):21–28

    Google Scholar 

  11. Klir G, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice Hall. Englewood Cliffs

  12. Lowe HJ, Barnett GO (1994) Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Jama 271(14):1103–1108

    Google Scholar 

  13. Pruitt KD, Maglott DR (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 29(1):137–140

    Google Scholar 

  14. Rindflesch TC, Hunter L et al (1999) Mining molecular binding terminology from biomedical text. Proc AMIA Symp :127–131

  15. Rindflesch TC, Tanabe L et al (2000) EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac Symp Biocomput : 517–528

  16. Shannon C, Weaver E (1949) The mathematical theory of communication. University of Illinois Press, Chicago and Urbana

  17. Smalheiser NR, Swanson DR (1998) Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed 57(3):149–153

    Google Scholar 

  18. Stapley BJ, Benoit G (2000) Biobibliometrics: information retrieval and visualization from co- occurrences of gene names in Medline abstracts. Pac Symp Biocomput 5:529–540

    Google Scholar 

  19. Swanson DR (1986) Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med 30(1):7–18

    Google Scholar 

  20. Wallace DJ, Wallace BW (2002) All About Fibromyalgia. Oxford University Press, New York

  21. Wren JD, Bekeredjian R et al (2004) Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20(3):389–398

    Google Scholar 

  22. Wren JD, Garner HR (2002) Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries. Methods Inf Med 41(5):426–434

    Google Scholar 

  23. Wren JD, Garner HR (2004) Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network. Bioinformatics 20(2):191–198

    Google Scholar 

  24. Xenarios I, Rice DW et al (2000) DIP: the database of interacting proteins. Nucleic Acids Res 28(1):289–291

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan D. Wren.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wren, J. Using fuzzy set theory and scale-free network properties to relate MEDLINE terms. Soft Comput 10, 374–381 (2006). https://doi.org/10.1007/s00500-005-0497-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-005-0497-5

Keywords

Navigation