research-article

Scalable Lead Prediction with Transformers using HPC resources

Authors:
Archit Vasan

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-8299-1033
View Profile

,
Thomas Brettin

Argonne National Lab, USA

Argonne National Lab, USA

0000-0001-9301-9760
View Profile

,
Rick Stevens

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-4268-4020
View Profile

,
Arvind Ramanathan

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0002-1622-5488
View Profile

,
Venkatram Vishwanath

Argonne National Laboratory, United States of America

Argonne National Laboratory, United States of America

0000-0001-7248-6116
View Profile

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisNovember 2023Pages 123https://doi.org/10.1145/3624062.3624081

Published:12 November 2023Publication History

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 123

ABSTRACT

A promising direction in cancer drug discovery is high-throughput screening of extensive compound datasets to identify advantageous properties, including their ability to interact with relevant biomolecules such as proteins. However, traditional structural approaches for assessing binding affinity, such as free energy methods or molecular docking, pose significant computational bottlenecks when dealing with such vast datasets. To address this, we have developed a docking surrogate called the SMILES transformer (ST), which learns molecular features from the SMILES representation of compounds and approximates their binding affinity. SMILES data is first tokenized using a well-established SMILES-pair tokenizer and fed into a BERT-like Transformer model to generate vector embeddings for each molecule, effectively capturing the essential information. These extracted embeddings are then fed into a regression model to predict the binding affinity. Leveraging the high-performance computing resources at Argonne National Lab, we devised a workflow to scale model training and inference across multiple supercomputing nodes. To evaluate the performance and accuracy of our workflow, we conducted experiments using molecular docking binding affinity data on multiple receptors, comparing ST with another state-of-the-art docking surrogate. Impressively, both surrogates yielded comparable val-r2 measurements of between 70 and 90%, affirming the capability of ST to learn molecular features directly from language-based data. Furthermore, one significant advantage of the ST approach is its notably faster tokenization preprocessing compared to the alternative method, which requires generating molecular descriptors using Mordred. Our workflow facilitated screening of ∼3 billion compounds on 48 nodes of the Polaris supercomputer in approximately an hour. In summary, our approach presents an efficient means to screen extensive compound databases for potential molecular properties that could serve as lead compounds targeting cancer. Looking ahead, an important future direction for our workflow involves integrating de-novo drug design, enabling us to scale our efforts to explore the limits of synthesizable compounds within chemical space.

Index Terms

Scalable Lead Prediction with Transformers using HPC resources
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

In silico approaches and tools for the prediction of drug metabolism and fate: A review
Abstract
The fate of administered drugs is largely influenced by their metabolism. For example, endogenous enzyme–catalyzed conversion of drugs may result in therapeutic inactivation or activation or may transform the drugs into toxic chemical ...
Highlights
- In silico approaches and tools for predicting drug metabolism and fate are reviewed.
Read More
Discovery of novel influenza inhibitors targeting the interaction of dsRNA with the NS1 protein by structure-based virtual screening

Influenza A Non-structural protein 1 (NS1A) RNA-Binding Domain (RBD) bound to a double-stranded RNA (dsRNA), which can inhibit the activation of antiviral pathway. The chemical compound binding sites at this pocket have abilities to block NS1 protein to ...
Read More
Prediction of Compound-Target Interactions of Natural Products Using Large-scale Drug and Protein Information
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics

Verifying the proteins that are targeted by compounds of natural herbs will help select natural herb-based drug candidates. However, this entails a great deal of effort to clarify the interaction throughout in vitro or in vivo experiments. In this light,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062

Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep Learning
Drug discovery
High performance computing
Language Models
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 19
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Scalable Lead Prediction with Transformers using HPC resources

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

Cited By

Index Terms

Recommendations

In silico approaches and tools for the prediction of drug metabolism and fate: A review

Discovery of novel influenza inhibitors targeting the interaction of dsRNA with the NS1 protein by structure-based virtual screening

Prediction of Compound-Target Interactions of Natural Products Using Large-scale Drug and Protein Information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Scalable Lead Prediction with Transformers using HPC resources

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

ABSTRACT

Cited By

Index Terms

Recommendations

In silico approaches and tools for the prediction of drug metabolism and fate: A review

Discovery of novel influenza inhibitors targeting the interaction of dsRNA with the NS1 protein by structure-based virtual screening

Prediction of Compound-Target Interactions of Natural Products Using Large-scale Drug and Protein Information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media