skip to main content
10.1145/3512290.3528824acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

An evolutionary fragment-based approach to molecular fingerprint reconstruction

Published: 08 July 2022 Publication History

Abstract

For in silico drug discovery various representations have been established regarding storing and processing molecular data. The choice of representation has a great impact on employed methods and algorithms. Molecular fingerprints in the form of fixed-size bit vectors are a widely used representation which captures structural features of a molecule and enables a straight-forward way of estimating molecule similarities. However, since fingerprints are not invertible, they are rarely utilized for molecule generation tasks. This study presents an approach to the reconstruction of molecules from their fingerprint representation that is based on genetic algorithms. The algorithm assembles molecules from BRICS fragments and therefore only generates valid molecular structures. We demonstrate that the genetic algorithm is able to construct molecules similar to the specified target, or even reconstruct the original molecule. Furthermore, to illustrate how this genetic algorithm unlocks fingerprints as a representation for other in silico drug discovery methods, a novel Transformer neural language model trained on molecular fingerprints is introduced as a molecule generation model.

Supplemental Material

ZIP File
Supplemental material.

References

[1]
Jörg Degen, Christof Wegscheid-Gerlach, Andrea Zaliani, and Matthias Rarey. 2008. On the Art of Compiling and Using 'Drug-Like' Chemical Fragment Spaces. ChemMedChem 3, 10 (oct 2008), 1503--1507.
[2]
Jan H. Jensen. 2019. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chemical Science 10, 12 (2019), 3567--3572.
[3]
Wengong Jin, Regina Barzilay, and Tommi S Jaakkola. 2018. Junction Tree Variational Autoencoder for Molecular Graph Generation. In Proceedings of the 35th International Conference on Machine Learning, {ICML} 2018, Stockholmsmässan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80), Jennifer G Dy and Andreas Krause (Eds.). PMLR, 2328--2337. http://proceedings.mlr.press/v80/jin18a.html
[4]
Youngchun Kwon, Seokho Kang, Youn-Suk Choi, and Inkoo Kim. 2021. Evolutionary design of molecules based on deep learning and a genetic algorithm. Scientific Reports 11, 1 (dec 2021), 17304.
[5]
Tuan Le, Robin Winter, Frank Noé, and Djork-Arné Clevert. 2020. Neuraldecipher - reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures. Chemical Science 11, 38 (2020), 10378--10389.
[6]
Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter W Battaglia. 2018. Learning Deep Generative Models of Graphs. CoRR abs/1803.0 (2018). arXiv:1803.03324 http://arxiv.org/abs/1803.03324
[7]
Brian B Masek, Lingling Shen, Karl M Smith, and Robert S Pearlman. 2008. Sharing chemical information without sharing chemical structure. Journal of chemical information and modeling 48, 2 (2008), 256--261.
[8]
Noel M. O'Boyle and Roger A. Sayle. 2016. Comparing structural fingerprints using a literature-based similarity benchmark. Journal of Cheminformatics 8, 1 (dec 2016), 36.
[9]
Scott C.-H. Pegg, Jose J. Haresco, and Irwin D. Kuntz. 2001. A genetic algorithm for structure-based de novo design. Journal of Computer-Aided Molecular Design 15, 10 (2001), 911--933.
[10]
A. Petrowski. 1996. A clearing procedure as a niching method for genetic algorithms. In Proceedings of IEEE International Conference on Evolutionary Computation ICEC-96. IEEE, IEEE, 798--803.
[11]
Marco Podda, Davide Bacciu, and Alessio Micheli. 2020. A Deep Generative Model for Fragment-Based Molecule Generation. In The 23rd International Conference on Artificial Intelligence and Statistics, {AISTATS} 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy] (Proceedings of Machine Learning Research, Vol. 108), Silvia Chiappa and Roberto Calandra (Eds.). PMLR, 2240--2250. http://proceedings.mlr.press/v108/podda20a.html
[12]
Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alán Aspuru-Guzik, and Alex Zhavoronkov. 2020. Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Frontiers in Pharmacology 11 (dec 2020), 565644. arXiv:1811.12823
[13]
Sereina Riniker and Gregory A Landrum. 2013. Open-source platform to benchmark fingerprints for ligand-based virtual screening. Journal of Cheminformatics 5, 1 (dec 2013), 26.
[14]
David Rogers and Mathew Hahn. 2010. Extended-Connectivity Fingerprints. Journal of Chemical Information and Modeling 50, 5 (may 2010), 742--754.
[15]
Marwin H.S. Segler, Thierry Kogej, Christian Tyrchan, and Mark P. Waller. 2018. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Central Science 4, 1 (2018), 120--131. arXiv:1701.01329
[16]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000--6010.
[17]
Renxiao Wang, Ying Gao, and Luhua Lai. 2000. LigBuilder: A Multi-Purpose Program for Structure-Based Drug Design. Journal of Molecular Modeling 6, 7-8 (aug 2000), 498--516.
[18]
Robin Winter, Floriane Montanari, Frank Noé, and Djork-Arné Clevert. 2019. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chemical Science 10, 6 (2019), 1692--1701.

Cited By

View all
  • (2023)Reconstruction of lossless molecular representations from fingerprintsJournal of Cheminformatics10.1186/s13321-023-00693-015:1Online publication date: 23-Feb-2023
  • (2022)Learning substructure invariance for out-of-distribution molecular representationsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601212(12964-12978)Online publication date: 28-Nov-2022

Index Terms

  1. An evolutionary fragment-based approach to molecular fingerprint reconstruction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2022
    1472 pages
    ISBN:9781450392372
    DOI:10.1145/3512290
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 July 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. genetic algorithm
    2. molecular representation
    3. molecule design

    Qualifiers

    • Research-article

    Data Availability

    Conference

    GECCO '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Reconstruction of lossless molecular representations from fingerprintsJournal of Cheminformatics10.1186/s13321-023-00693-015:1Online publication date: 23-Feb-2023
    • (2022)Learning substructure invariance for out-of-distribution molecular representationsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601212(12964-12978)Online publication date: 28-Nov-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media