Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Matters Arising
  • Published:

Reusability report: Learning the language of synthetic methods used in medicinal chemistry

The Original Article was published on 28 January 2021

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparison of rxnfp (PST) embeddings of reactions from the AZ-ELN dataset and the SCH27k dataset.
Fig. 2: An example of a reaction type present in the AZ-ELN but not in the public dataset.
Fig. 3: Performance of rxnfp classifiers on AZ-ELN data.

Data availability

In this work, we applied the methods described in the Schwaller et al. publication to reactions captured in the Electronic Laboratory Notebook (ELN) from AstraZeneca. The AZ-ELN is a platform to collect and archive every chemical synthesis experiment run within the company in a digital format and is a great resource for these kinds of large-scale investigations. Although the data are proprietary, we show that similar results can be reproduced on publicly available data from the Schneider 50,000 set, which can be filtered by the reader to only contain reaction classes starting with 1, 2 or 3 to obtain the 27,000 reaction data described in the ‘Dataset preparation’ section. Furthermore, this comparison is important to understand how well publicly available reaction sets represent datasets in the pharmaceutical industry.

Code availability

The rxnfp code and the experiments on the public datasets, as well as an interactive TMAP, are provided at the GitHub page of the original work1: https://rxn4chemistry.github.io/rxnfp7. We applied it without modifications. RDKit’s Chem.MolToRandomSmilesVect function was used to randomize SMILES14.

References

  1. Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).

    Article  Google Scholar 

  2. Schneider, N., Lowe, D. M., Sayle, R. A., Tarselli, M. A. & Landrum, G. A. Big data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter. J. Med. Chem. 59, 4385–4402 (2016).

    Article  Google Scholar 

  3. NameRxn (Nextmove Software, accessed 22 December 2020); http://www.nextmovesoftware.com/namerxn.html

  4. Lowe, D. Chemical reactions from US patents (1976–Sep2016) https://figshare.com/articles/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873 (2017).

  5. Schneider, N., Lowe, D. M., Sayle, R. A. & Landrum, G. A. Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J. Chem. Inf. Model. 55, 39–53 (2015).

    Article  Google Scholar 

  6. Probst, D. & Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 12, 12 (2020).

    Article  Google Scholar 

  7. Schwaller P. et al. rxn4chemistry/rxnfp: initial Zenodo release (version v0.0.7). (Zenodo, 2020); https://doi.org/10.5281/zenodo.4277570

  8. Brown, D. & Boström, J. Analysis of past and present synthetic methodologies on medicinal chemistry: where have all the new reactions gone? J. Med. Chem. 59, 4443–4458 (2016).

    Article  Google Scholar 

  9. Carey, J. S., Laffan, D., Thomson, C. & William, M. T. Analysis of the reactions used for the preparation of drug candidate molecules. Org. Biomol. Chem. 4, 2337–2347 (2006).

    Article  Google Scholar 

  10. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet  MATH  Google Scholar 

  11. Haghighi, S. et al. PyCM: multiclass confusion matrix library in Python. J. Open Source Softw. 3, 729 (2018).

    Article  Google Scholar 

  12. Meanwell, N. Synopsis of some recent tactical application of bioisosteres in drug design. J. Med. Chem. 54, 2529–2591 (2011).

    Article  Google Scholar 

  13. Schwaller, P. et al. Prediction of chemical reaction yields using deep learning. Mach. Learn. Sci. Technol. 2, 015016 (2021).

    Article  Google Scholar 

  14. Landrum, G. A. RDKit: open-source cheminformatics software, version 2020.03 (RDKit, 2020); http://www.rdkit.org

  15. Scott, J. S. et al. Tricyclic indazoles—a novel class of selective estrogen receptor degrader antagonists. J. Med. Chem. 62, 1593–1608 (2019) .

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the writing of the manuscript. J.P.J. trained and analysed the machine learning models.

Corresponding author

Correspondence to Jonas Boström.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Timothy Cernak and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Discussion, Supplementary Tables 1–3 and Fig. 1.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Janet, J.P., Tomberg, A. & Boström, J. Reusability report: Learning the language of synthetic methods used in medicinal chemistry. Nat Mach Intell 3, 572–575 (2021). https://doi.org/10.1038/s42256-021-00367-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-021-00367-2

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research