Motif Discovery via Convolutional Networks with K-mer Embedding

Wang, Dailun; Zhang, Qinhu; Yuan, Chang-An; Qin, Xiao; Huang, Zhi-Kai; Shang, Li

doi:10.1007/978-3-030-26969-2_36

Dailun Wang¹¹,
Qinhu Zhang¹²,
Chang-An Yuan¹²,
Xiao Qin¹²,
Zhi-Kai Huang¹³ &
…
Li Shang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11644))

Included in the following conference series:

International Conference on Intelligent Computing

1646 Accesses
6 Citations

Abstract

With the rapid development of deep learning, some discriminative motif discovery methods based on deep neural network are gradually becoming the mainstream, which also bringing huge improvement of prediction accuracy. In this paper, we propose a convolutional neural network based architecture (eCNN), combining embedding layer with GloVe. Firstly, eCNN divides each single sequence of ChIP-seq datasets into multiple subsequences called k-mers by a sliding window, and then encoding k-mers into a relatively low dimension vectors by GloVe, and finally scores each vector using multiple convolutional networks. The experiment shows that our architecture can get good results on the task of motif discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Furey, T.S.: ChIP–seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat. Rev. Genet. 13, 840–852 (2012)
Article Google Scholar
Berger, M.F., Philippakis, A.A., Qureshi, A.M., He, F.S., Estep III, P.W., Bulyk, M.L.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429 (2006)
Article Google Scholar
Jothi, R., Cuddapah, S., Barski, A., Cui, K., Zhao, K.: Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 36, 5221–5231 (2008)
Article Google Scholar
Stormo, G.D.: Consensus patterns in DNA. Methods Enzymol. 183, 211–221 (1990)
Article Google Scholar
Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)
Article Google Scholar
Zhao, X., Huang, H., Speed, T.P.: Finding short DNA motifs using permuted Markov models. J. Comput. Biol. 12, 894–906 (2005)
Article Google Scholar
Badis, G., et al.: Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009)
Article Google Scholar
Weirauch, M.T., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126 (2013)
Article Google Scholar
Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)
Article Google Scholar
Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015)
Article Google Scholar
Huang, D.S.: Systematic theory of neural networks for pattern recognition. Publishing House of Electronic Industry of China, Beijing, vol. 201 (1996)
Google Scholar
Huang, D.S.: Radial basis probabilistic neural networks: model and application. Int. J. Pattern Recogn. Artif. Intell. 13, 1083–1101 (1999)
Article Google Scholar
Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32, i121–i127 (2016)
Article Google Scholar
Quang, D., Xie, X.: DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107–e107 (2016)
Article Google Scholar
Kelley, D.R., Snoek, J., Rinn, J.L.: Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016)
Article Google Scholar
Hassanzadeh, H.R., Wang, M.D.: DeeperBind: Enhancing prediction of sequence specificities of DNA binding proteins. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 178–183 (2017)
Google Scholar
Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89, 31–71 (1997)
Article Google Scholar
Amores, J.: Multiple instance classification: review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013)
Article MathSciNet Google Scholar
Wu, J., Yu, Y., Huang, C., Yu, K.: Deep multiple instance learning for image classification and auto-annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3460–3469 (2015)
Google Scholar
Van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1879–1886 (2011)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 391–405. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_26
Chapter Google Scholar
Gao, Z., Ruan, J.: Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning. Bioinformatics 33(14), 2097–2105 (2017)
Article Google Scholar
Annala, M., Laurila, K., Lähdesmäki, H., Nykter, M.: A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS ONE 6, e20059 (2011)
Article Google Scholar
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: Fifteenth International Conference on Machine Learning, pp. 341–349 (1998)
Google Scholar
Park, Y., Kellis, M.: Deep learning for regulatory genomics. Nature Biotechnol. 33, 825–826 (2015)
Article Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Google Scholar
Shen, Z., Bao, W.-Z., Huang, D.S.: Recurrent neural network for predicting transcription factor binding sites. Sci. Rep. 8, 15270 (2018)
Google Scholar
Zhang, H., Zhu, L., Huang, D.S.: DiscMLA: an efficient discriminative motif learning algorithm over high-throughput datasets. IEEE/ACM Trans. Comput. Biol. Bioinform. 15(6), 1810–1820 (2018)
Article Google Scholar
Guo, W.-L., Huang, D.S.: An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. Mol. BioSyst. 13(9), 1827–1837 (2017). https://doi.org/10.1039/c7mb00155j
Article Google Scholar
Shen, Z., Zhang, Y.-H., Han, K., Nandi, A.K., Honig, B., Huang, D.S.: miRNA-disease association prediction with collaborative matrix factorization. Complexity 2017(2017), 1–9 2017
Article MathSciNet Google Scholar
Yuan, L., Yuan, C.-A., Huang, D.S.: FAACOSE: a fast adaptive ant colony optimization algorithm for detecting SNP epistasis. Complexity 2017(2017), 1–10 (2017)
Article MathSciNet Google Scholar
Yuan, L., et al.: Nonconvex penalty based low-rank representation and sparse regression for eQTL mapping. IEEE/ACM Trans. Comput. Biol. Bioinform. 14(5), 1154–1164 (2017)
Article Google Scholar
Deng, S.-P., Cao, S., Huang, D.S., Wang, Y.-P.: Identifying stages of kidney renal cell carcinoma by combining gene expression and DNA methylation data. IEEE/ACM Trans. Comput. Biol. Bioinform. 14(5), 1147–1153 (2017)
Article Google Scholar
Jiang, W., Huang, D.S., Li, S.: Random-walk based solution to triple level stochastic point location problem. IEEE Trans. Cybern. 46(6), 1438–1451 (2016)
Article Google Scholar
Deng, S.-P., Zhu, L., Huang, D.S.: Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(1), 27–35 (2016)
Article Google Scholar
Deng, S.-P., Huang, D.S.: An integrated strategy for functional analysis of microbial communities based on gene ontology and 16S rRNA gene. Int. J. Data Min. Bioinform. (IJDMB) 13(1), 63–74 (2015)
Article MathSciNet Google Scholar
Deng, S.-P., Zhu, L., Huang, D.S.: Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks. BMC Genomics 16(Suppl 3), S4 (2015)
Article Google Scholar
Deng, S.-P., Huang, D.S.: SFAPS: an R package for structure/function analysis of protein sequences based on informational spectrum method. Methods 69(3), 207–212 (2014)
Article Google Scholar
Huang, D.S., Zhang, L., Han, K., Deng, S., Yang, K., Zhang, H.: Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 15(6), 553–560 (2014)
Google Scholar
Huang, D.S., Yu, H.-J.: Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids. IEEE/ACM Trans. Comput. Biol. Bioinform. 10(2), 457–467 (2013)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the grants of the National Science Foundation of China, Nos. 61861146002, 61520106006, 61772370, 61873270, 61702371, 61672382, 61672203, 61572447, 61772357, and 61732012, China Post-doctoral Science Foundation Grant, No. 2017M611619, and supported by “BAGUI Scholar” Program and the Scientific & Technological Base and Talent Special Program, GuiKe AD18126015 of the Guangxi Zhuang Autonomous Region of China.

Author information

Authors and Affiliations

Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
Dailun Wang
Science Computing and Intelligent Information Processing of GuangXi Higher Education Key Laboratory, Nanning Normal University, Nanning, Guangxi, China
Qinhu Zhang, Chang-An Yuan & Xiao Qin
College of Mechanical and Electrical Engineering, Nanchang Institute of Technology, Nanchang, 330099, Jiangxi, China
Zhi-Kai Huang
Department of Communication Technology, College of Electronic Information Engineering, Suzhou Vocational University, Suzhou, 215104, Jiangsu, China
Li Shang

Authors

Dailun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qinhu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chang-An Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Kai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Li Shang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dailun Wang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Nanchang Institute of Technology, Nanchang, China
Zhi-Kai Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Zhang, Q., Yuan, CA., Qin, X., Huang, ZK., Shang, L. (2019). Motif Discovery via Convolutional Networks with K-mer Embedding. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-26969-2_36
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26968-5
Online ISBN: 978-3-030-26969-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics