Abstract
Dynamic mutations of simple sequence repeats (SSRs) have been demonstrated to affect normal gene function and cause different genetic disorders. Several conserved and even partial functional SSR patterns are discovered in inherited orthologous disease genes. To explore a wide range of SSRs in genetic diseases, a comprehensive system focusing on identifying orthologous SSRs of disease genes through a comparative genomics mechanism is constructed and accomplished by adopting online Mendelian inheritance in man (OMIM) and NCBI HomoloGene databases as the fundamental resources of human genetic diseases and homologous gene information. In addition, an efficient and effective algorithm for searching SSR patterns is also developed for providing annotated SSR information among various model species. By integrating these data resources and mining technologies, biologists and doctors can systematically retrieve novel and important conserved SSR information among orthologous disease genes. The proposed system, Orthologous SSR for Disease Genes (OSDG), is the first comprehensive framework for identifying orthologous SSRs as potential causative factors of genetic disorders and is freely available at http://osdg.cs.ntou.edu.tw/.
Similar content being viewed by others
References
B. Charlesworth, P. Sniegowski, and W. Stephan, The evolutionary dynamics of repetitive dna in eukaryotes, Nature, 1994, 371: 215–220.
P. C. Sharma, A. Grover, and G. Kahl, Mining microsatellites in eukaryotic genomes, Trends Biotechnol., 2007, 25: 490–498.
A. Bacolla, J. E. Larson, J. R. Collins, et al., Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties, Genome Res., 2008, 18: 1545–1553.
J. Jurka and C. Pethiyagoda, Simple repetitive dna sequences from primates: Compilation and analysis, J. Mol. Evol., 1995, 40: 120–126.
J. D. Wren, E. Forgacs, J. W. 3rd. Fondon, et al., Repeat polymorphisms within gene regions: Phenotypic and evolutionary implications, Am. J. Hum. Genet., 2000, 67: 345–356.
F. Calafell, A. Shuster, W. C. Speed, et al., Short tandem repeat polymorphism evolution in humans, Eur. J. Hum. Genet., 1998, 6: 38–49.
S. Subramanian, V. M. Madgula, R. George, et al., Triplet repeats in human genome: Distribution and their association with genes and other genomic regions, Bioinformatics, 2003, 19: 549–552.
Y. Li, A. B. Korol, T. Fahima, and E. Nevo, Microsatellites within genes: Structure, function, and evolution, Mol. Biol. Evol., 2004, 21: 991–1007.
Genetic disease information. URL: http://www.ornl.gov/sci/techresources/HumanGenome/medicine/assist.shtml.
J. N. Hirschhorn, K. Lohmueller, E. Byrne, and K. Hirschhorn, A comprehensive review of genetic association studies, Genet. Med., 2002, 4: 45–61.
G. R. Sutherland and R. I. Richards, Simple tandem dna repeats and human genetic disease, Proc. Natl. Acad. Sci. USA, 1995, 92: 3636–3641.
R. I. Richards, K. Holman, S. Yu, and G. R. Sutherland, Fragile x syndrome unstable element, p(ccg)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins, Hum. Mol. Genet., 1993, 2: 1429–1435.
J. F. Gusella and M. E. Macdonald, Huntington’s disease: seeing the pathogenic process through a genetic lens, Trends Biochem. Sci., 2006, 31: 533–540.
M. Perucho, Microsatellite instability: The mutator that mutates the other mutator, Nat. Med., 1996, 2: 630–631.
Y. Kashi and D. G. King, Simple sequence repeats as advantageous mutators in evolution, Trends Genet., 2006, 22: 253–259.
G. Toth, Z. Gaspari, and J. Jurka, Microsatellites in different eukaryotic genomes: Survey and analysis, Genome Res., 2000, 10: 967–981.
A. Alexeyenko, J. Lindberg, A. Perez-Bercoff, and E. L. Sonnhammer, Overview and comparison of ortholog databases, Drug Discovery Today: Technologies, 2006, 3: 137–143.
E. Sonnhammer and E. V. Koonin, Orthology, paralogy and proposed classification for paralog subtypes, Trends Genet., 2002, 18: 619–620.
A. E. Guttmacher and F. S. Collins, Genomic medicine-A primer, N. Engl. J. Med., 2002, 347: 1512–1520.
Online mendelian inheritance in man, omim (tm). URL: http://www.ncbi.nlm.nih.gov/omim/, 2008/12/25.
Homologene. URL: http://www.ncbi.nlm.nih.gov/sites/entrez?db=homologene.
T. W. Pai, C. M. Chen, M. C. Hsiao, et al., An online conserved ssr discovery through cross-species comparison, Advances and Applications in Bioinformatics and Chemistry, 2009, 2: 23–35.
T. Boby, A. Patch, and S. J. Aves, Trbase: A database relating tandem repeats to disease genes for the human genome, Bioinformatics, 2005, 21: 811–816.
K. P. O’Brien, I. Westerlund, and E. Sonnhammer, Orthodisease: A database of human disease orthologs, Hum. Mutat., 2004, 24: 112–119.
A. Hamosh, A. F. Scott, J. S. Amberger, et al., Online mendelian inheritance in man (omim), a knowledgebase of human genes and genetic disorders, Nucleic Acids Res., 2005, 33: D514–517.
T. J. P. Hubbard, B. L. Aken, S. Ayling, et al., Ensembl 2009, Nucleic Acids Res., 2009, 37: D690–697.
C. M. Chen, W. S. Tzou, T. H. Shih, et al., Identification of conserved simple sequence repeats from orthologous disease genes, World Congress in Computer Science, Computer Engineering, and Applied Computing, 2009, I: 129–133.
S. E. Andrew, Y. P. Goldberg, B. Kremer, et al., The relationship between trinucleotide (cag) repeat length and clinical features of huntington’s disease, Nat. Genet., 1993, 4: 398–403.
K. Kieburtz, M. MacDonald, C. Shih, et al., Trinucleotide repeat length and progression of illness in huntington’s disease, J. Med. Genet., 1994, 31: 872–874.
G. A. Singer and D. A. Hickey, Nucleotide bias causes a genomewide bias in the amino acid composition of proteins, Mol. Biol. Evol., 2000, 17: 1581–1588.
F. Naumann, H. Muller-Hartmann, H. Deissler, and W. Doerfler, On the function of the cgg-binding protein, Gene Function and Disease, 2001, 2(2–3): 89–94.
Sputnik. URL: http://espressosoftware.com/sputnik/index.html, 1994.
G. Benson, Tandem repeats finder: A program to analyze dna sequences, Nucleic Acids Res., 1999, 27: 573–580.
V. Parisi, V. De Fonzo, and F. Aluffi-Pentini, String: Finding tandem repeats in dna sequences, Bioinformatics, 2003, 19: 1733–1738.
R. Kolpakov, G. Bana, and G. Kucherov, Mreps: Efficient and flexible detection of tandem repeats in dna, Nucleic Acids Res., 2003, 31: 3672–3678.
Y. Wexler, Z. Yakhini, Y. Kashi, and D. Geiger, Finding approximate tandem repeats in genomic sequences, Recomb’04: Proceedings of the Eighth Annual International Conference on Resaerch in Computational Molecular Biology, 2004: 223–232.
Msatfinder: Detection and characterisation of microsatellites. URL: http://www.genomics.ceh.ac.uk/msatfinder/, 2005.
V. Boeva, M. Regnier, D. Papatsenko, and V. Makeev, Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression, Bioinformatics, 2006, 22: 676–684.
R. Kofler, C. Schlotterer, and T. Lelley, Sciroko: A new tool for whole genome microsatellite search and investigation, Bioinformatics, 2007, 23: 1683–1685.
S. B. Mudunuri and H. A. Nagarajaram, Imex: Imperfect microsatellite extractor, Bioinformatics, 2007, 23: 1181–1187.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is supported by the Center for Marine Bioenvironment and Biotechnology (CMBB) in National Taiwan Ocean University, Keelung, Taiwan, and the National Science Council in Taiwan (NSC97-2627-B-019-003).
Rights and permissions
About this article
Cite this article
Chen, C., Chen, C., Shih, T. et al. Efficient algorithms for identifying orthologous simple sequence repeats of disease genes. J Syst Sci Complex 23, 906–916 (2010). https://doi.org/10.1007/s11424-010-0203-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-010-0203-2