Abstract
In the post-genomic era, the identification of genes involved in human disease is one of the most important tasks. Disease phenotypes provide a window into the gene function. Several approaches to identify disease related genes based on function annotations have been presented in recent years. Most of them, starting from the function annotations of known genes associated with diseases, however, can not be used to identify genes for diseases without any known pathogenic genes or related function annotations. We have built a new system, CDGMiner, to predict genes associated with these diseases which lack detailed function annotations. CDGMiner is implemented mainly by two phases, text mining and functional similarity analysis. The performance of CDGMiner was tested with a set of 1506 genes involved in 1147 disease phenotypes derived from the OMIM database. Our results show that, on average, the target gene was in the top 13.60%, and the target gene was in the top 5% with a 40.70% chance. CDGMiner shows promising performance compared to other existing tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lander, E.S., Linton, L.M., Birren, B.: Initial Sequencing and Analysis of the Human Genome. Nature 409, 860–921 (2001)
Venter, J.C., Adams, M.D., Myers, E.W.: The Sequence of the Human Genome. Science 291, 1304–1351 (2001)
McCarthy, M.I., Smedley, D., Hide, W.: New Methods for Finding Disease-Susceptibility Genes: Impact and Potential. Genome Biology 4, 119 (2003)
Driel, M.A.V., Cuelenaere, K., Kemmeren, P.P.C.W., Leunissen, J.A.M., Brunner, H.G., Vriend, G.: GeneSeeker: Extraction and Integration of Human Disease-related Information from Web-based Genetic Databases. Nucleic Acids Res. 33, W758–W761 (2003)
Masseroli, M., Galati, O., Pinciroli, F.: GFINDer: Genetic Disease and Phenotype Location Statistic Analysis and Mining of Dynamically Annotated Gene Lists. Nucleic Acids Res. 33, W717–W723 (2005)
Tiffin, N., Kelso, J.F., Powell, A.R.: Integration of Text and Data-mining Using Ontologies Successfully Select Disease Gene Candidates. Nucleic Acids Res. 33, 1544–1552 (2005)
Perez-Iratxeta, C., Bork, P.: Update of the G2D Tool for Prioritization of Gene Candidates to Inherited Diseases. Nucleic Acids Res. 35(Web Server issue), W212–W216 (2007)
Zhang, P., Zhang, J., Sheng, H.: Gene Functional Similarity Search Tool (GFSST). BMC Bioinformatics 7, 135 (2006)
Lopez-Bigas, N., Ouzounis, C.A.: Genome-wide Identification of Genes Likely to be Involved in Human Genetic Disease. Nucleic Acids Res. 32, 3108–3114 (2004)
Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J., Pickard, B.S.: Speeding Disease Gene Discovery by Sequence Based Candidate Prioritization. BMC Bioinformatics 6, 55 (2005)
Oti, M.: Predicting Disease Genes Using Protein–protein Interactions. J. Med. Genet. 43, 691–698 (2006)
Jianzhen, X., Yongjin, L.: Discovering Disease-genes by Topological Features in Human Protein-protein Interaction Network. Bioinformatics 22, 2800–2805 (2006)
Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J., Pickard, B.S.: SUSPECTS: Enabling Fast and Effective Prioritization of Positional Candidates. Bioinformatics 22, 773–777 (2006)
Aerts, S.: Gene Prioritization through Genomic Data Fusion. Nat. Biotechnol. 24, 537–544 (2006)
Franke, L., Bakel, H., Fokkens, L., Jong, E.D., Egmont-Petersen, M., Wijmenga, C.: Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate Genes. Am. J. Hum. Genet. 78, 1011–1025 (2006)
Jimenez-Sanchez, G., Barton, C., David, V.: Human Disease Genes. Nature 409, 853–855 (2001)
MEDLINE/PubMed, http://www.ncbi.nlm.nih.gov/PubMed
EBI GOA project, http://www.ebi.ac.uk/GOA/index.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yuan, F., Zhou, Y. (2008). CDGMiner: A New Tool for the Identification of Disease Genes by Text Mining and Functional Similarity Analysis. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_118
Download citation
DOI: https://doi.org/10.1007/978-3-540-85984-0_118
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85983-3
Online ISBN: 978-3-540-85984-0
eBook Packages: Computer ScienceComputer Science (R0)