skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Automating Genetic Algorithm Mutations for Molecules Using a Masked Language Model

Journal Article · · IEEE Transactions on Evolutionary Computation

Inspired by the evolution of biological systems, genetic algorithms have been applied to generate solutions for optimization problems in a variety of scientific and engineering disciplines. For a given problem, a suitable genome representation must be defined along with a mutation operator to generate subsequent generations. Unlike natural systems which display a variety of complex rearrangements (e.g. mobile genetic elements), mutation for genetic algorithms commonly utilizes only random point-wise changes. Furthermore, generalizing beyond point-wise mutations poses a key difficulty as useful genome rearrangements depend on the representation and problem domain. To move beyond the limitations of manually defined point-wise changes, here we propose the use of techniques from masked language models to automatically generate mutations. As a first step, common subsequences within a given population are used to generate a vocabulary. The vocabulary is then used to tokenize each genome. A masked language model is trained on the tokenized data in order to generate possible rearrangements (i.e. mutations). In order to illustrate the proposed strategy, we use string representations of molecules and use a genetic algorithm to optimize for drug-likeness and synthesizability. Finally, our results show that moving beyond random point-wise mutations accelerates genetic algorithm optimization.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
Grant/Contract Number:
AC05-00OR22725; AC02-06CH11357; AC52-07NA27344; AC5206NA25396
OSTI ID:
1845799
Journal Information:
IEEE Transactions on Evolutionary Computation, Vol. 26, Issue 4; ISSN 1089-778X
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (30)

ZeRO: Memory optimizations Toward Training Trillion Parameter Models conference November 2020
Transformers: State-of-the-Art Natural Language Processing conference January 2020
Mapping the space of chemical reactions using attention-based neural networks journal January 2021
Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction journal August 2019
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions journal June 2009
Quantifying the chemical beauty of drugs journal January 2012
BERT-ATTACK: Adversarial Attack Against BERT Using BERT conference January 2020
The Molecule Evoluator. An Interactive Evolutionary Algorithm for the Design of Drug-Like Molecules journal January 2006
High Performance I/O For Large Scale Deep Learning conference December 2019
De Novo Drug Design Using Multiobjective Evolutionary Graphs journal January 2009
Mining a Chemical Database for Fragment Co-occurrence:  Discovery of “Chemical Clichés” journal January 2006
Japanese and Korean voice search conference March 2012
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules journal February 1988
Simple Evolutionary Optimization Can Rival Stochastic Gradient Descent in Neural Networks
  • Morse, Gregory; Stanley, Kenneth O.
  • GECCO '16: Genetic and Evolutionary Computation Conference, Proceedings of the Genetic and Evolutionary Computation Conference 2016 https://doi.org/10.1145/2908812.2908916
conference July 2016
Smiles-Bert
  • Wang, Sheng; Guo, Yuzhi; Wang, Yuhong
  • Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3307339.3342186
conference September 2019
Computer-Automated Evolution of an X-Band Antenna for NASA's Space Technology 5 Mission journal March 2011
Population-based De Novo Molecule Generation, Using Grammatical Evolution journal November 2018
Abandoning Objectives: Evolution Through the Search for Novelty Alone journal June 2011
Stochastic Voyages into Uncharted Chemical Space Produce a Representative Library of All Possible Drug-Like Compounds journal May 2013
A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules journal May 2004
GuacaMol: Benchmarking Models for de Novo Molecular Design journal October 2018
Introduction to Evolutionary Computing book January 2015
Quality and Diversity Optimization: A Unifying Modular Framework journal April 2018
Mobility of Plasmids journal August 2010
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks journal December 2017
“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models journal January 2018
Using GANs with adaptive training data to search for new molecules journal February 2021
Bidirectional Molecule Generation with Recurrent Neural Networks journal January 2020
Extended-Connectivity Fingerprints journal April 2010
Randomized SMILES strings improve the quality of molecular generative models journal November 2019

Similar Records

Adaptive language model training for molecular design
Journal Article · Thu Jun 08 00:00:00 EDT 2023 · Journal of Cheminformatics · OSTI ID:1845799

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports
Journal Article · Sun Apr 15 00:00:00 EDT 2018 · Journal of Digital Imaging (Online) · OSTI ID:1845799

Investigation of the mutagenic specificity of x-ray using a retroviral shuttle vector in CHO cells
Journal Article · Wed Jan 01 00:00:00 EST 1992 · Environmental and Molecular Mutagenesis; (United States) · OSTI ID:1845799