Skip to main content

CDGMiner: A New Tool for the Identification of Disease Genes by Text Mining and Functional Similarity Analysis

  • Conference paper
Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence (ICIC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5227))

Included in the following conference series:

  • 2144 Accesses

Abstract

In the post-genomic era, the identification of genes involved in human disease is one of the most important tasks. Disease phenotypes provide a window into the gene function. Several approaches to identify disease related genes based on function annotations have been presented in recent years. Most of them, starting from the function annotations of known genes associated with diseases, however, can not be used to identify genes for diseases without any known pathogenic genes or related function annotations. We have built a new system, CDGMiner, to predict genes associated with these diseases which lack detailed function annotations. CDGMiner is implemented mainly by two phases, text mining and functional similarity analysis. The performance of CDGMiner was tested with a set of 1506 genes involved in 1147 disease phenotypes derived from the OMIM database. Our results show that, on average, the target gene was in the top 13.60%, and the target gene was in the top 5% with a 40.70% chance. CDGMiner shows promising performance compared to other existing tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lander, E.S., Linton, L.M., Birren, B.: Initial Sequencing and Analysis of the Human Genome. Nature 409, 860–921 (2001)

    Article  Google Scholar 

  2. Venter, J.C., Adams, M.D., Myers, E.W.: The Sequence of the Human Genome. Science 291, 1304–1351 (2001)

    Article  Google Scholar 

  3. McCarthy, M.I., Smedley, D., Hide, W.: New Methods for Finding Disease-Susceptibility Genes: Impact and Potential. Genome Biology 4, 119 (2003)

    Article  Google Scholar 

  4. Driel, M.A.V., Cuelenaere, K., Kemmeren, P.P.C.W., Leunissen, J.A.M., Brunner, H.G., Vriend, G.: GeneSeeker: Extraction and Integration of Human Disease-related Information from Web-based Genetic Databases. Nucleic Acids Res. 33, W758–W761 (2003)

    Article  Google Scholar 

  5. Masseroli, M., Galati, O., Pinciroli, F.: GFINDer: Genetic Disease and Phenotype Location Statistic Analysis and Mining of Dynamically Annotated Gene Lists. Nucleic Acids Res. 33, W717–W723 (2005)

    Article  Google Scholar 

  6. Tiffin, N., Kelso, J.F., Powell, A.R.: Integration of Text and Data-mining Using Ontologies Successfully Select Disease Gene Candidates. Nucleic Acids Res. 33, 1544–1552 (2005)

    Article  Google Scholar 

  7. Perez-Iratxeta, C., Bork, P.: Update of the G2D Tool for Prioritization of Gene Candidates to Inherited Diseases. Nucleic Acids Res. 35(Web Server issue), W212–W216 (2007)

    Article  Google Scholar 

  8. Zhang, P., Zhang, J., Sheng, H.: Gene Functional Similarity Search Tool (GFSST). BMC Bioinformatics 7, 135 (2006)

    Article  Google Scholar 

  9. Lopez-Bigas, N., Ouzounis, C.A.: Genome-wide Identification of Genes Likely to be Involved in Human Genetic Disease. Nucleic Acids Res. 32, 3108–3114 (2004)

    Article  Google Scholar 

  10. Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J., Pickard, B.S.: Speeding Disease Gene Discovery by Sequence Based Candidate Prioritization. BMC Bioinformatics 6, 55 (2005)

    Article  Google Scholar 

  11. Oti, M.: Predicting Disease Genes Using Protein–protein Interactions. J. Med. Genet. 43, 691–698 (2006)

    Article  Google Scholar 

  12. Jianzhen, X., Yongjin, L.: Discovering Disease-genes by Topological Features in Human Protein-protein Interaction Network. Bioinformatics 22, 2800–2805 (2006)

    Article  Google Scholar 

  13. Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J., Pickard, B.S.: SUSPECTS: Enabling Fast and Effective Prioritization of Positional Candidates. Bioinformatics 22, 773–777 (2006)

    Article  Google Scholar 

  14. Aerts, S.: Gene Prioritization through Genomic Data Fusion. Nat. Biotechnol. 24, 537–544 (2006)

    Article  Google Scholar 

  15. Franke, L., Bakel, H., Fokkens, L., Jong, E.D., Egmont-Petersen, M., Wijmenga, C.: Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes. Am. J. Hum. Genet. 78, 1011–1025 (2006)

    Article  Google Scholar 

  16. Jimenez-Sanchez, G., Barton, C., David, V.: Human Disease Genes. Nature 409, 853–855 (2001)

    Article  Google Scholar 

  17. MEDLINE/PubMed, http://www.ncbi.nlm.nih.gov/PubMed

  18. EBI GOA project, http://www.ebi.ac.uk/GOA/index.html

  19. OMIM, http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM

  20. MeSH, http://www.ncbi.nlm.nih.gov/MeSH

Download references

Author information

Authors and Affiliations

Authors

Editor information

De-Shuang Huang Donald C. Wunsch II Daniel S. Levine Kang-Hyun Jo

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yuan, F., Zhou, Y. (2008). CDGMiner: A New Tool for the Identification of Disease Genes by Text Mining and Functional Similarity Analysis. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_118

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85984-0_118

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85983-3

  • Online ISBN: 978-3-540-85984-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics