Abstract
The Fisher Exact Test score (FETS) and its variants are based on the hypergeometric distribution. It’s very natural to describe the enrichment level of TF binding site (TFBS) by it. And several widely used methods that discriminant motif discovery have choose them as the objective functions, for example, HOMER and DERME. Although the method is highly efficient and universal, FETS is a non-smooth and non-differentiable function. So it can not be optimized numerically. In order to solve the problem, the current methods that learn to optimize FETS either reduce the search set to discrete domain or introduce some external variables which will definitely hurt the precision, not to mention that to use the complete potential of input sequences for generate motifs. In this paper, we propose an approach that allows direct learning the motifs parameters in the continuous space use the FETS as the objective function. We find that when the loss function is optimized in a coordinate-wise mode, the cost function can be a piece-wise constant function in each resultant sub-problem. The process of finding optimal value is exactly and efficiently. Furthermore one key step in every iteration of optimize the FETS requires finding the most statistically significant scores among the tens of thousands of Fisher’s exact test scores, which is solved efficiently by a ‘lookahead’ technique. Experiments on ENCODE ChIP-seq data testify the performance of the proposed method.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Slattery, M., Zhou, T.Y., Yang, L., Machado, A.C.D., Gordan, R., Rohs, R.: Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014)
Mason, M.J., Plath, K., Zhou, Q.: Identification of context dependent motifs by contrasting ChIP binding data. Bioinformatics 26, 2826–2832 (2010)
Bailey, T.L.: DREME: motif discovery in transcription factor ChIPseq data. Bioinformatics 27, 1653–1659 (2011)
Ichinose, N., Yada, T., Gotoh, O.: Large-scale motif discovery using DNA Gray code and equiprobable oligomers. Bioinformatics 28, 25–31 (2012)
Furey, T.S.: ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat. Rev. Genet. 13, 840–852 (2012)
Zhu, L., Guo, W.L., Deng, S.P., Huang, D.S.: ChIP-PIT: enhancing the analysis of ChIP-seq data using convex-relaxed pair-wise interaction tensor decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 13, 55–63 (2016)
Patel, R.Y., Stormo, G.D.: Discriminative motif optimization based on perceptron training. Bioinformatics 30, 941–948 (2014)
Yao, Z., MacQuarrie, K.L., Fong, A.P., Tapscott, S.J., Ruzzo, W.L., Gentleman, R.C.: Discriminative motif analysis of high-throughput dataset. Bioinformatics 30, 775–783 (2013)
Agostini, F., Cirillo, D., Ponti, R., Tartaglia, G.: SeAMotE: a method for high-throughput motif discovery in nucleic acid sequences. BMC Genom. 15, 925 (2014)
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., et al.: Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B Cell Identities. Mol. Cell 38, 576–589 (2010)
Maaskola, J., Rajewsky, N.: Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models. Nucleic Acids Res. 42, 12995–13011 (2014)
McLeay, R.C., Bailey, T.L.: Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinform. 11, 11 (2010)
Tanaka, E., Bailey, T.L., Keich, U.: Improving MEME via a twotiered significance analysis. Bioinformatics 30, 1965–1973 (2014)
Liseron-Monfils, C., Lewis, T., Ashlock, D., McNicholas, P.D., Fauteux, F., Strömvik, M., et al.: Promzea: a pipeline for discovery of coregulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas. BMC Plant Biol. 13, 1–17 (2013)
Yu, Q., Huo, H.W., Vitter, J.S., Huan, J., Nekrich, Y.: An efficient exact algorithm for the motif stem search problem over large alphabets. IEEE-ACM Trans. Comput. Biol. Bioinform. 12, 384–397 (2015)
Hartmann, H., Guthöhrlein, E.W., Siebert, M., Luehr, S., Söding, J.: P-value-based regulatory motif discovery using positional weight matrices. Genome Res. 23, 181–194 (2013)
Pizzi, C., Rastas, P., Ukkonen, E.: Finding significant matches of position weight matrices in linear time. IEEE-ACM Trans. Comput. Biol. Bioinform. 8, 69–79 (2011)
Valen, E., Sandelin, A., Winther, O., Krogh, A.: Discovery of regulatory elements is improved by a discriminatory approach. PLoS Comput. Biol. 5, 8 (2009)
Colombo, N., Vlassis, N.: FastMotif: spectral sequence motif discovery. Bioinformatics 31, 2623–2631 (2015)
Eden, E., Lipson, D., Yogev, S., Yakhini, Z.: Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 3, e39 (2007)
Hsieh, C.-J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: KDD, San Diego, CA, USA, pp. 1064–1072 (2011)
ENCODE-Project-Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29–W37 (2011)
Simcha, D., Price, N.D., Geman, D.: The limits of De Novo DNA motif discovery. PLoS ONE 7, 9 (2012)
Eggeling, R., Roos, T., Myllymäki, P., Grosse, I.: Inferring intramotif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics 16, 1–15 (2015)
Huang, D.S., Zheng, C.H.: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22, 1855–1862 (2006)
Wang, B., Chen, P., Huang, D.S., Li, J.J., Lok, T.M., Lyu, M.R.: Predicting protein interaction sites from residue spatial sequence profile and evolution rate. FEBS Lett. 580, 380–384 (2006)
Zhu, L., You, Z.H., Huang, D.S.: Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding. Neurocomputing 121, 99–107 (2013)
Acknowledgments
This work was supported by the grants of the National Science Foundation of China, Nos. 61532008, 61672203, 61402334, 61472282, 61520106006, 31571364, U1611265, 61472280, 61472173, 61572447, 61373098 and 61672382, China Postdoctoral Science Foundation Grant, Nos. 2016M601646.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, N. (2017). Discriminative Motif Elicitation via Maximization of Statistical Overpresentation. In: Huang, DS., Bevilacqua, V., Premaratne, P., Gupta, P. (eds) Intelligent Computing Theories and Application. ICIC 2017. Lecture Notes in Computer Science(), vol 10361. Springer, Cham. https://doi.org/10.1007/978-3-319-63309-1_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-63309-1_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63308-4
Online ISBN: 978-3-319-63309-1
eBook Packages: Computer ScienceComputer Science (R0)