skip to main content
10.1145/1947940.1948030acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicccsConference Proceedingsconference-collections
research-article

Planted (l,d) motif finding with allowable mismatches using kernel based approach

Published: 12 February 2011 Publication History

Abstract

For the last few years there has been a growing interest in discovery of significant patterns in biological sequences that correspond to some structural and/or functional feature of the bio-molecule known as motifs and has important application in determining regulatory sites, splice sites, promoter sequence and drug target identification. Identification of motif is challenging because it exists in different sequences in various mutated forms. Despite extensive studies over the last few years using several approaches such as statistical, exhaustive, heuristic etc. this problem is far from being satisfactorily solved. In this paper, we consider planted (l,d) motif search problem in a given set of DNA sequences using a kernel based approach. The proposed kernel is evaluated using synthetic data and also on real data sets from different organisms such as yeast and worm. The results on these datasets indicate improved performance of the proposed kernel by allowing classification of DNA sequences with larger motif lengths.

References

[1]
Bailey, T. L., and Elkan, C., 1995. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning. 21, pp. 51--80.
[2]
Boser, B. E., Guyon, I. M., and Vapnik, V. N., 1992. A training algorithm for optimal margin classifiers. In 5th Annual ACM Workshop on COLT, Pittsburgh, PA: ACM Press, pp. 144--152.
[3]
Bulyk, M. L. 2003. Computational prediction of transcription-factor binding site locations. Genome Biol, 5, pp. 201--11.
[4]
Ferreira, P. G., Azevodo, P. J. 2007. Evaluating determining motif significance measures in protein databases. BioMedCentral.
[5]
Hannenhalli, S., and Levy, S. 2001. Promoter prediction in the human genome. Bioinformatics, 17(Suppl 1), pp. 90--6.
[6]
Hu, J., Li, B., and Kihara, D. 2005. Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res, 33, pp. 4899--4913.
[7]
Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12(6), pp. 996--1006.
[8]
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., and Wootton, J. C. 1993. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, pp. 208--214.
[9]
Leslie, C., Kuang, R., and Eskin, E. 2003. Inexact matching string kernels for protein classification. Kernel Methods in Computational Biology, pp 95--112, MIT Press.
[10]
Pavesi, G., Mauri, G., and Pesole, G. 2004. In silico representation and discovery of transcription factor binding sites. Brief Bioinform, 5(3), pp. 217--36.
[11]
Pevzner, P., and Sze, S. 2000. Combinatorial approaches to finding subtle signals in DNA sequences. In Proc. Eighth International Conference on Intelligent systems for Molecular biology, pp. 269--278.
[12]
Scholkopf, B., and Smola, A. 2002. Learning with Kernels. MIT Press, 2002.
[13]
Shawe-Taylor, J., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis. Cambridge UP.
[14]
Sonnenburg, S., Raetsch, G., Schaefer, C., and Schoelkopf, B. 2006. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research. 7, pp. 1531--1565.
[15]
Tompa, M., Li, N., Bailey, T. L., Church, G. M., DeMoor, B., Eskin, E., Favorov, A. V., Frith, M. C., Fu, Y., Kent, W. J., Makeev, V. J., Mironov, A. A., Noble, W. S., Pavesi, G. Pesole. G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van, Helden. J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., and Zhu, Z. 2005. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 23, pp. 137--44.
[16]
UCSC Genome Browser database, Karolchik. D., Kuhn, R. M., Baertsch, R., Barber, G. P., Clawson, H., Diekhans, M., Giardine, B., Harte, R. A., Hinrichs, A. S., Hsu, F., Miller, W., Pedersen, J. S., Pohl, A., Raney, B. J., Rhead, B., Rosenbloom, K. R., Smith, K. E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A. S., Haussler, D., Kent, W. J. 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Research.
[17]
Vapnik, V. N. 1999. The Nature of Statistical Learning Theory. Springer, 2nd edition.
[18]
Vishwanathan, S., and Smola, A. 2003. Fast kernels for string and tree matching- Kernel Methods in Computational Biology. pp. 113--130, MIT Press.

Index Terms

  1. Planted (l,d) motif finding with allowable mismatches using kernel based approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICCCS '11: Proceedings of the 2011 International Conference on Communication, Computing & Security
    February 2011
    656 pages
    ISBN:9781450304641
    DOI:10.1145/1947940
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 February 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. planted motif
    2. spectrum kernel
    3. support vector machine

    Qualifiers

    • Research-article

    Conference

    ICCCS '11

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 106
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media