Abstract
Chromatin immunoprecipitation experiments and the subsequent sequencing of these fragments (ChIP-Seq) are subject to great uncertainty, due to execution errors, technical and calculation limitations and the inherent complexity of the biological systems to be studied. Therefore, one of the challenges that researchers face when analyzing the results of ChIP-Seq experiments is to elucidate the pattern behind the obtained sequences (peaks), facing a huge amount of data and noise. A significant amount of statistical tools and algorithms have been proposed to solve this issue in the last years. The method presented in this paper innovates by taking advantage of both the structure of the data obtained in these experiments (peaks) and the existing resources. The motif or pattern obtained by this procedure from these peaks is considered the most characteristic motif. This method also allows to obtain the quality metrics of the analyzed experiment. The method has been validated with data retrieved from public repositories.
This work has been funded by the Spanish Ministry of Economy, Industry and Competitiveness, the European Regional Development Fund (ERDF) Programme through grant TIN2017-85949-C2-1-R and by the Spanish Ministry of Education, Culture and Sports through fellowship FPU014/06303.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsData Availability
The data generated in this work is available at https://github.com/gines-almagro/ChIP-Seq-motif.
References
Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B.: Genome-wide mapping of in vivo protein-DNA interactions. Science 316(5830), 1497–1502 (2007)
Lin, C.Y., et al.: Transcriptional amplification in tumor cells with elevated c-Myc. Cell 1(151), 56–67 (2012)
Cheneby, J., Gheorghe, M., Artufel, M., Mathelier, A., Ballester, B.: ReMap 2018: an updated regulatory regions atlas from an integrative analysis of DNA-binding ChIP-Seq experiments. Nucleic Acids Res. 46(DI), D267–D275 (2018)
Cunningham, F., Achuthan, P., et al.: Ensembl 2019. Nucleic Acids Res. 47(DI), D745–D751 (2019)
Heinz, S., Benner, C., et al.: Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38(4), 576–589 (2010)
Tremblay, B.J.M.: Universalmotif: Import, Modify, and Export Motifs with R. R package version 1.4.6 (2020). https://github.com/bjmt/universalmotif
Leonardi, T.: ggheatmap: generate pretty ggplot2 heatmaps with row and column dendrograms. R package version 0.0.0.9000. (2020)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing (2019). https://www.R-project.org/
Fornés, O., Castro-Mondragon, J.A., et al.: JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2019)
Mahony, S., Benos, P.V.: STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 35(Web Server issue), W253–W258 (2007)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Almagro-Hernández, G., Fernández-Breis, J.T. (2020). Discovering the Most Characteristic Motif from a Set of Peak Sequences. In: Rojas, I., Valenzuela, O., Rojas, F., Herrera, L., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2020. Lecture Notes in Computer Science(), vol 12108. Springer, Cham. https://doi.org/10.1007/978-3-030-45385-5_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-45385-5_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45384-8
Online ISBN: 978-3-030-45385-5
eBook Packages: Computer ScienceComputer Science (R0)