This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Data availability
In this work, we applied the methods described in the Schwaller et al. publication to reactions captured in the Electronic Laboratory Notebook (ELN) from AstraZeneca. The AZ-ELN is a platform to collect and archive every chemical synthesis experiment run within the company in a digital format and is a great resource for these kinds of large-scale investigations. Although the data are proprietary, we show that similar results can be reproduced on publicly available data from the Schneider 50,000 set, which can be filtered by the reader to only contain reaction classes starting with 1, 2 or 3 to obtain the 27,000 reaction data described in the ‘Dataset preparation’ section. Furthermore, this comparison is important to understand how well publicly available reaction sets represent datasets in the pharmaceutical industry.
Code availability
The rxnfp code and the experiments on the public datasets, as well as an interactive TMAP, are provided at the GitHub page of the original work1: https://rxn4chemistry.github.io/rxnfp7. We applied it without modifications. RDKit’s Chem.MolToRandomSmilesVect function was used to randomize SMILES14.
References
Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).
Schneider, N., Lowe, D. M., Sayle, R. A., Tarselli, M. A. & Landrum, G. A. Big data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter. J. Med. Chem. 59, 4385–4402 (2016).
NameRxn (Nextmove Software, accessed 22 December 2020); http://www.nextmovesoftware.com/namerxn.html
Lowe, D. Chemical reactions from US patents (1976–Sep2016) https://figshare.com/articles/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873 (2017).
Schneider, N., Lowe, D. M., Sayle, R. A. & Landrum, G. A. Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J. Chem. Inf. Model. 55, 39–53 (2015).
Probst, D. & Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 12, 12 (2020).
Schwaller P. et al. rxn4chemistry/rxnfp: initial Zenodo release (version v0.0.7). (Zenodo, 2020); https://doi.org/10.5281/zenodo.4277570
Brown, D. & Boström, J. Analysis of past and present synthetic methodologies on medicinal chemistry: where have all the new reactions gone? J. Med. Chem. 59, 4443–4458 (2016).
Carey, J. S., Laffan, D., Thomson, C. & William, M. T. Analysis of the reactions used for the preparation of drug candidate molecules. Org. Biomol. Chem. 4, 2337–2347 (2006).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Haghighi, S. et al. PyCM: multiclass confusion matrix library in Python. J. Open Source Softw. 3, 729 (2018).
Meanwell, N. Synopsis of some recent tactical application of bioisosteres in drug design. J. Med. Chem. 54, 2529–2591 (2011).
Schwaller, P. et al. Prediction of chemical reaction yields using deep learning. Mach. Learn. Sci. Technol. 2, 015016 (2021).
Landrum, G. A. RDKit: open-source cheminformatics software, version 2020.03 (RDKit, 2020); http://www.rdkit.org
Scott, J. S. et al. Tricyclic indazoles—a novel class of selective estrogen receptor degrader antagonists. J. Med. Chem. 62, 1593–1608 (2019) .
Author information
Authors and Affiliations
Contributions
All authors contributed to the writing of the manuscript. J.P.J. trained and analysed the machine learning models.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Machine Intelligence thanks Timothy Cernak and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Discussion, Supplementary Tables 1–3 and Fig. 1.
Rights and permissions
About this article
Cite this article
Janet, J.P., Tomberg, A. & Boström, J. Reusability report: Learning the language of synthetic methods used in medicinal chemistry. Nat Mach Intell 3, 572–575 (2021). https://doi.org/10.1038/s42256-021-00367-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-021-00367-2
This article is cited by
-
Revisiting code reusability
Nature Machine Intelligence (2022)