Abstract
The identification of overrepresented motifs in a collection of biological sequences continues to be a relevant and challenging problem in computational biology. Currently popular methods of motif discovery are based on statistical learning theory. In this paper, a machine-learning approach to the motif discovery problem is explored. The approach is based on a Self-Organizing Map (SOM) where the output layer neuron weight vectors are replaced by position weight matrices. This approach can be used to characterise features present in a set of sequences, and thus can be used as an aid in overrepresented motif discovery. The SOM approach to motif discovery is demonstrated using biological sequence datasets, both real and simulated
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abe T., Kanaya S., Kinouchi M., Ichiba Y., Kozuki T., Ikemura T. (2003). Informatics for Unveiling Hidden Genome Signatures. Genome Research 13:693–702
Bailey T.L., Elkan C. (1994). Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. Proceedings of the International Conference on Intelligent Systems for Molecular Biology 2:8–36
Bussemaker H.J., Li H., Siggia E.D. (2000). Building a Dictionary for Genomes: Identification of Presumptive Regulatory Sites by Statistical Analysis. Proceedings of the National Academy of Sciences of the United States of America 97:10096–10100
Gupta M., Liu J.S. (2003). Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model. Journal of the American Statistical Association 98:55–66
Hughes J.D., Estep P.W., Tavazoie S., Church G.M. (2000). Computational Identification of Cis-regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces Cerevisiae. Journal of Molecular Biology 296:1205–1214
Kanaya S., Kinouchi M., Abe T., Kudo Y., Yamada Y., Nishi T., Mori H., Ikemura T. (2001). Analysis of Codon Usage Diversity of Bacterial Genes with a Self-organizing Map (SOM): Characterization of Horizontally Transferred Genes with Emphasis on the E. coli O157 Genome. Gene 276:89–99
Kohonen T. (1995). Self-Organizing Maps. Springer-Verlag, Berlin
Kohonen T., Somervuo P. (2002). How to Make Large Self-organizing Maps for Nonvectorial Data. Neural Networks 15:945–952
Lawrence C.E., Altschul S.F., Boguski M.S., Liu J.S., Neuwald A.F., Wootton J.C. (1993). Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262:208–214
Lawrence C.E., Reilly A.A. (1990). An Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences. Proteins 7:41–51
Liu X., Brutlag D.L., Liu J.S. (2001). BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes. Pacific Symposium on Biocomputing 127–138
Mahony S., McInerney J.O., Smith T.J., Golden A. (2004). Gene Prediction Using the Self-Organizing Map: Automatic Generation of Multiple Gene Models. BMC Bioinformatics 5:23
Matys V., Fricke E., Geffers R., Gossling E., Haubrock M., Hehl R., Hornischer K., Karas D., Kel A.E., Kel-Margoulis O.V. et al. (2003). TRANSFAC: Transcriptional Regulation, from Patterns to Profiles. Nucleic Acids Research 31:374–378
Pevzner P.A., Sze S.H. (2000). Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. Proceedings of the International Conference on Intelligent Systems for Molecular Biology 8:269–278
Rigoutsos I., Floratos A. (1998). Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm. Bioinformatics 14:55–67
Sinha S., Tompa M. (2002). Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation. Nucleic Acids Research 30:5549–5560
Wan H., Li L., Federhen S., Wootton J.C. (2003). Discovering Simple Regions in Biological Sequences Associated with Scoring Schemes. Journal of Computational Biology 10:171–185
Wang H.C., Badger J., Kearney P., Li M. (2001). Analysis of Codon Usage Patterns of Bacterial Genomes Using the Self-organizing Map. Molecular Biology and Evolution 18:792–800
Yang Z.R., Chou K.C. (2003). Mining Biological Data Using Self-organizing Map. Journal of Chemical Information and Computer Science 43:1748–1753
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahony, S., Hendrix, D., Smith, T.J. et al. Self-Organizing Maps of Position Weight Matrices for Motif Discovery in Biological Sequences. Artif Intell Rev 24, 397–413 (2005). https://doi.org/10.1007/s10462-005-9011-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-005-9011-9