Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks

Cappelletti, Luca; Petrini, Alessandro; Gliozzo, Jessica; Casiraghi, Elena; Schubach, Max; Kircher, Martin; Valentini, Giorgio

doi:10.1007/978-3-030-45385-5_54

Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks

Luca Cappelletti¹³,
Alessandro Petrini¹³,
Jessica Gliozzo^13,16,
Elena Casiraghi¹³,
Max Schubach^14,15,
Martin Kircher^14,15 &
…
Giorgio Valentini¹³

Conference paper
First Online: 30 April 2020

1675 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 12108))

Abstract

The annotation and characterization of tissue-specific cis-regulatory elements (CREs) in non-coding DNA represents an open challenge in computational genomics. Several prior works show that machine learning methods, using epigenetic or spectral features directly extracted from DNA sequences, can predict active promoters and enhancers in specific tissues or cell lines. In particular, very recently deep-learning techniques obtained state-of-the-art results in this challenging computational task. In this study, we provide additional evidence that Feed Forward Neural Networks (FFNN) trained on epigenetic data and one-dimensional convolutional neural networks (CNN) trained on DNA sequence data can successfully predict active regulatory regions in different cell lines. We show that model selection by means of Bayesian optimization applied to both FFNN and CNN models can significantly improve deep neural network performance, by automatically finding models that best fit the data. Further, we show that techniques applied to balance active and non-active regulatory regions in the human genome in training and test data may lead to over-optimistic or poor predictions. We recommend to use actual imbalanced data that was not used to train the models for evaluating their generalization performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
ENCODE Data at ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC.
2.
ENCODE Data at ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC; ENCODE fold-change values are described here https://sites.google.com/site/anshulkundaje.
3.
https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/.
4.
https://genome.ucsc.edu/.
5.
https://github.com/LucaCappelletti94/ucsc_genomes_downloader.
6.
For computing Multiple Correspondence Analysis we used the python package available at https://github.com/esafak/mca.
7.
https://scikit-optimize.github.io/.

References

Latchman, D.S.: Transcription factors: an overview. Int. J. Exp. Pathol. 74, 417–422 (1993)
CAS PubMed PubMed Central Google Scholar
Mora, A., Sandve, G.K., Gabrielsen, O.S., Eskeland, R.: In the loop: promoter-enhancer interactions and bioinformatics. Brief. Bioinform. 17, 980–995 (2016)
CAS PubMed Google Scholar
Lambert, S.A., et al.: The human transcription factors. Cell 172, 650–665 (2018)
Article CAS PubMed Google Scholar
Schubach, M., Re, M., Robinson, P.N., Valentini, G.: Imbalance-aware machine learning for predicting rare and commondisease-associated non-coding variants. Sci. Rep. 7(1), 1–2 (2017)
Article CAS Google Scholar
Rentzsch, P., Witten, D., Cooper, G., Shendure, J., Kircher, M.: CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019)
Article CAS PubMed Google Scholar
Javierre, B., et al.: Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016)
Article CAS PubMed PubMed Central Google Scholar
Bernstein, B., et al.: The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045 (2010)
Article CAS PubMed PubMed Central Google Scholar
Dunham, I., et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Article CAS Google Scholar
Shen, Y., et al.: A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116 (2012)
Article CAS PubMed PubMed Central Google Scholar
Zhu, J., et al.: Genome-wide chromatin state transitions associated with developmental and environmental cues. Cell 152, 642–654 (2013)
Article CAS PubMed PubMed Central Google Scholar
Noguchi, S., et al.: FANTOM5 CAGE profiles of human and mouse samples. Sci. Data 4, 170112 (2017)
Article CAS PubMed PubMed Central Google Scholar
Lizio, M., et al.: Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22 (2015)
Article CAS PubMed PubMed Central Google Scholar
Kundaje, A., et al.: Integrative analysis of 111 reference human epigenomes. Nature 518, 317 (2015)
Article CAS PubMed PubMed Central Google Scholar
Ernst, J., Kellis, M.: ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9(3), 215–216 (2012)
Article CAS PubMed PubMed Central Google Scholar
Hoffman, M.M., Buske, O.J., Wang, J., Weng, Z., Bilmes, J.A., Noble, W.S.: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473 (2012)
Article CAS PubMed PubMed Central Google Scholar
Kwasnieski, J.C., Fiore, C., Chaudhari, H.G., Cohen, B.A.: High-throughput functional testing of encode segmentation predictions. Genome Res. 24, 1595–1602 (2014)
Article CAS PubMed PubMed Central Google Scholar
Yip, K.Y., et al.: Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012)
Article CAS PubMed PubMed Central Google Scholar
Lu, Y., Qu, W., Shan, G., Zhang, C.: DELTA: a distal enhancer locating tool based on AdaBoost algorithm and shape features of chromatin modifications. PLoS ONE 10, e0130622 (2015)
Article PubMed PubMed Central Google Scholar
Kleftogiannis, D., Kalnis, P., Bajic, V.: DEEP: a general computational framework for predicting enhancers. Nucleic Acids Res. 43(1), e6 (2014)
Article PubMed PubMed Central Google Scholar
Min, X., Zeng, W., Chen, S., Chen, N., Chen, T., Jiang, R.: Predicting enhancers with deep convolutional neural networks. BMC Bioinformatics 18, 478 (2017). https://doi.org/10.1186/s12859-017-1878-3
Article CAS PubMed PubMed Central Google Scholar
Li, Y., Shi, W., Wasserman, W.W.: Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 19, 202 (2018)
Article PubMed PubMed Central Google Scholar
Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article CAS PubMed Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article PubMed Google Scholar
Park, Y., Kellis, M.: Deep learning for regulatory genomics. Nat. Biotechnol. 33, 825 (2015)
Article CAS PubMed Google Scholar
Yang, B., et al.: BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 33(13), 1930–1936 (2017)
Article CAS PubMed Google Scholar
Liu, F., Li, H., Ren, C., Bo, X.C., Shu, W.: PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci. Rep. 6, 28517 (2016)
Article CAS PubMed PubMed Central Google Scholar
Andersson, R., et al.: An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)
Article CAS PubMed PubMed Central Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article PubMed Google Scholar
Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980). https://doi.org/10.1007/BF00344251
Article CAS PubMed Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Google Scholar
Hierlemann, A., Schweizer-Berberich, M., Weimar, U., Kraus, G., Pfau, A., Göpel, W.: Pattern recognition and multicomponent analysis. Sens. Update 2, 119–180 (1996)
Article CAS Google Scholar
Chollet, F., et al.: Keras (2018). https://github.com/fchollet/keras
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Google Scholar
Swersky, K., Snoek, J., Adams, P.: Multi-task Bayesian optimization. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2004–2012. Curran Associates, Inc., Red Hook (2013)
Google Scholar
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016)
Article Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2012, pp. 2951–2959. Curran Associates, Inc., Red Hook (2012)
Google Scholar
Dozat, T.: Incorporating Nesterov momentum into Adam. In: International Conference on Learning Representations, Workshop (ICLRW), pp. 1–6 (2016)
Google Scholar
Bewick, V., Cheek, L., Ball, J.R.: Statistics review 13: receiver operating characteristic curves. Crit. Care 8, 508–512 (2004)
Article PubMed PubMed Central Google Scholar
Boyd, K., Eng, K.H., Page, C.D.: Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 451–466. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_29
Chapter Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)
Article Google Scholar
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, 1–21 (2015)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945)
Article Google Scholar
Pratt, J.W.: Remarks on zeros and ties in the Wilcoxon signed rank procedures. J. Am. Stat. Assoc. 54, 655–667 (1959)
Article Google Scholar
Derrick, B., Paul W.: Comparing two samples from an individual Likert question. Int. J. Math. Stat. 18(3) (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
Luca Cappelletti, Alessandro Petrini, Jessica Gliozzo, Elena Casiraghi & Giorgio Valentini
Berlin Institute of Health (BIH), Berlin, Germany
Max Schubach & Martin Kircher
Charité – Universitätsmedizin Berlin, Berlin, Germany
Max Schubach & Martin Kircher
Department of Dermatology, Fondazione IRCCS Ca’ Granda - Ospedale Maggiore Policlinico, Milan, Italy
Jessica Gliozzo

Authors

Luca Cappelletti
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Petrini
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Gliozzo
View author publications
You can also search for this author in PubMed Google Scholar
Elena Casiraghi
View author publications
You can also search for this author in PubMed Google Scholar
Max Schubach
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kircher
View author publications
You can also search for this author in PubMed Google Scholar
Giorgio Valentini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgio Valentini .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Granada, Granada, Spain
Olga Valenzuela
University of Granada, Granada, Spain
Fernando Rojas
University of Granada, Granada, Spain
Luis Javier Herrera
University of Chicago and Fundacion Progreso y Salud, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cappelletti, L. et al. (2020). Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2020. Lecture Notes in Computer Science(), vol 12108. Springer, Cham. https://doi.org/10.1007/978-3-030-45385-5_54

Download citation

DOI: https://doi.org/10.1007/978-3-030-45385-5_54
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45384-8
Online ISBN: 978-3-030-45385-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics