Abstract
We present regLM, a framework to design synthetic CREs with desired properties, such as high, low or cell type-specific activity, using autoregressive language models in conjunction with supervised sequence-to-function models. Using regLM, we designed synthetic yeast promoters of defined strength, as well as cell type-specific human enhancers. We show that the synthetic CREs generated by regLM contain biological features similar to experimentally validated CREs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
de Boer, C.G., Vaishnav, E.D., Sadeh, R., Abeyta, E.L., Friedman, N., Regev, A.: Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 38(1), 56–65 (2020)
Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Fornes, O., et al.: Ontarget: in silico design of minipromoters for targeted delivery of expression. Nucleic Acids Res. gkad375 (2023)
Gosai, S.J., et al.: Machine-guided design of synthetic cell type-specific cis -regulatory elements. bioRxiv (2023)
Linder, J., Seelig, G.: Fast activation maximization for molecular sequence design. BMC Bioinform. 22(1), 510 (2021)
Nguyen, E., et al.: HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution. arXiv (2023)
Schreiber, J., Lu, Y.Y.: Ledidi: designing genomic edits that induce functional activity (2020)
Taskiran, I.I., Spanier, K.I., Christiaens, V., Mauduit, D., Aerts, S.: Cell type directed design of synthetic enhancers (2022)
Vaishnav, E.D., et al.: The evolution, evolvability and engineering of gene regulatory DNA. Nature 603(7901), 455–463 (2022)
Wittkopp, P.J., Kalay, G.: Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13(1), 59–69 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lal, A., Garfield, D., Biancalani, T., Eraslan, G. (2024). regLM: Designing Realistic Regulatory DNA with Autoregressive Language Models. In: Ma, J. (eds) Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science, vol 14758. Springer, Cham. https://doi.org/10.1007/978-1-0716-3989-4_24
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3989-4_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-1-0716-3988-7
Online ISBN: 978-1-0716-3989-4
eBook Packages: Computer ScienceComputer Science (R0)