Abstract
Previously a fingerprint based on monomer composition (MCFP) of nonribosomal peptides (NRPs) has been introduced. MCFP is a novel method for obtaining a representative description of NRP structures from their monomer composition in a fingerprint form. An effective screening and prediction of biological activities has been obtained from Norine NRPs database. In this paper, we present an extension of the MCFP fingerprint. This extension is based on adding few columns into the fingerprint; representing monomer clusters, 2D structures, peptide categories, and peptide diversity. All these data have been extracted from the NRP structure. Experiments with Norine NRPs database showed that the extended MCFP, that can be called Monomer Structure FingerPrint (MSFP) produced high prediction accuracy (> 95%) together with a high recall rate (86%) obtained when MSFP was used for prediction and similarity searching. From this study it appeared that MSFP mainly built from monomer composition can substantially be improved by adding more columns representing useful information about monomer composition and 2D structure of NRPs.



Similar content being viewed by others
References
Newman DJ, Cragg GM (2020) Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J Nat Prod 83:770–803. https://doi.org/10.1021/acs.jnatprod.9b01285
Balunas MJ, Kinghorn AD (2005) Drug discovery from medicinal plants. Life Sci 78:431–441. https://doi.org/10.1016/j.lfs.2005.09.012
Harvey AL, Edrada-Ebel R, Quinn RJ (2015) The re-emergence of natural products for drug discovery in the genomics era. Nat Rev Drug Discov 14:111–129. https://doi.org/10.1038/nrd4510
Thomford NE, Senthebane DA, Rowe A, Munro D, Seele P, Maroyi A, Dzobo K (2018) Natural products for drug discovery in the 21st century: innovations for novel drug discovery. Int J Mol Sci. https://doi.org/10.3390/ijms19061578
Liu M, Panda SK, Luyten W (2020) Plant-based natural products for the discovery and development of novel anthelmintics against nematodes. Biomolecules. https://doi.org/10.3390/biom10030426
Miller BR, Gulick AM (2016) Structural biology of nonribosomal peptide synthetases. Methods Mol Biol 1401:3–29. https://doi.org/10.1007/978-1-4939-3375-4_1
Blin K, Shaw S, Steinke K, Villebro R, Ziemert N, Lee SY, Medema MH, Weber T (2019) antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res 47:W81–W87. https://doi.org/10.1093/nar/gkz310
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Katta HY, Mojica A, Chen I-MA, Kyrpides NC, Reddy TBK (2019) Genomes OnLine database (GOLD) vol 7: updates and new features. Nucleic Acids Res 47:D649–D659. https://doi.org/10.1093/nar/gky977
Flissi A, Ricart E, Campart C, Chevalier M, Dufresne Y, Michalik J, Jacques P, Flahaut C, Lisacek F, Leclère V, Pupin M (2020) Norine: update of the nonribosomal peptide resource. Nucleic Acids Res 48:D465–D469. https://doi.org/10.1093/nar/gkz1000
Barley MH, Turner NJ, Goodacre R (2018) Improved descriptors for the quantitative structure-activity relationship modeling of peptides and proteins. J Chem Inf Model 58:234–243. https://doi.org/10.1021/acs.jcim.7b00488
Caboche S, Leclère V, Pupin M, Kucherov G, Jacques P (2010) Diversity of monomers in nonribosomal peptides: towards the prediction of origin and biological activity. J Bacteriol 192:5143–5150. https://doi.org/10.1128/JB.00315-10
Abdo A, Caboche S, Leclère V, Jacques P, Pupin M (2012) A new fingerprint to predict nonribosomal peptides activity. J Comput Aided Mol Des 26:1187–1194. https://doi.org/10.1007/s10822-012-9608-4
Nikolova N, Jaworska J (2003) Approaches to measure chemical similarity—a review. QSAR Comb Sci 22:1006–1026. https://doi.org/10.1002/qsar.200330831
Maldonado AG, Doucet JP, Petitjean M, Fan B-T (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 10:39–79. https://doi.org/10.1007/s11030-006-8697-1
Johnson MA, Maggiora GM (1990) Concepts and application of molecular similarity. Wiley, New York
Raymond JW, Willett P (2002) Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J Comput Aided Mol Des 16:521–533. https://doi.org/10.1023/A:1021271615909
Rarey M, Dixon JS (1998) Feature trees: a new molecular similarity measure based on tree matching. J Comput Aided Mol Des 12:471–490. https://doi.org/10.1023/A:1008068904628
Willett P (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discov Today 11:1046–1053. https://doi.org/10.1016/j.drudis.2006.10.005
Leach AR, Gillet VJ (2007) An introduction to chemoinformatics. Springer, Berlin
Kogej T, Engkvist O, Blomberg N, Muresan S (2006) Multifingerprint based similarity searches for targeted class compound selection. J Chem Inf Model 46:1201–1213. https://doi.org/10.1021/ci0504723
Sheridan RP, Miller MD, Underwood DJ, Kearsley SK (1996) Chemical similarity using geometric atom pair descriptors. J Chem Inf Comput Sci 36:128–136. https://doi.org/10.1021/ci950275b
Sheridan RP, Kearsley SK (2002) Why do we need so many chemical similarity search methods? Drug Discov Today 7:903–911. https://doi.org/10.1016/S1359-6446(02)02411-X
Abdo A, Salim N (2009) Similarity-based virtual screening using bayesian inference network: enhanced search using 2D fingerprints and multiple reference structures. QSAR Comb Sci 28:654–663. https://doi.org/10.1002/qsar.200860155
Xue L, Godden JW, Bajorath J (2000) Evaluation of descriptors and mini-fingerprints for the identification of molecules with similar activity. J Chem Inf Comput Sci 40:1227–1234. https://doi.org/10.1021/ci000327j
Xue L, Stahura FL, Godden JW, Bajorath J (2001) Mini-fingerprints detect similar activity of receptor ligands previously recognized only by three-dimensional pharmacophore-based methods. J Chem Inf Comput Sci 41:394–401. https://doi.org/10.1021/ci000305x
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976. https://doi.org/10.1126/science.1136800
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. https://doi.org/10.1021/ci100050t
O’Boyle NM, Sayle RA (2016) Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform. https://doi.org/10.1186/s13321-016-0148-0
Arif SM, Holliday JD, Willett P (2009) Analysis and use of fragment-occurrence data in similarity-based virtual screening. J Comput Aided Mol Des 23:655. https://doi.org/10.1007/s10822-009-9285-0
Abdo A, Chen B, Mueller C, Salim N, Willett P (2010) Ligand-based virtual screening using bayesian networks. J Chem Inf Model 50:1012–1020. https://doi.org/10.1021/ci100090p
Bajusz D, Rácz A, Héberger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform 7:20. https://doi.org/10.1186/s13321-015-0069-3
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. San Francisco, Morgan Kaufmann
Bugmann G (1998) Normalized Gaussian radial basis function networks. Neurocomputing 20:97–110. https://doi.org/10.1016/S0925-2312(98)00027-7
Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240:1285–1293. https://doi.org/10.1126/science.3287615
Triballeau N, Acher F, Brabet I, Pin J-P, Bertrand H-O (2005) Virtual screening workflow development guided by the “Receiver Operating Characteristic” curve approach. Application to high-throughput docking on metabotropic glutamate receptor subtype 4. J Med Chem 48:2534–2547. https://doi.org/10.1021/jm049092j
Siegel S, Jr NJC (1988) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill, New York
Abdo A, Salim N, Ahmed A (2011) Implementing relevance feedback in ligand-based virtual screening using bayesian inference network. J Biomol Screen 16:1081–1088. https://doi.org/10.1177/1087057111416658
MACCS structural keys. Accelrys, San Diego
Funding
This work was supported by Lille University, CNRS and Programme national d’aide à l’Accueil en Urgence des Scientifiques en Exil (PAUSE).
Author information
Authors and Affiliations
Contributions
The research was conducted by mutual contributions of all authors. All authors read and approved the final manuscript.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Abdo, A., Ghaleb, E., Alajmi, N.K.A. et al. Monomer structure fingerprints: an extension of the monomer composition version for peptide databases. J Comput Aided Mol Des 34, 1147–1156 (2020). https://doi.org/10.1007/s10822-020-00336-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-020-00336-8