Skip to main content
Log in

The centroidal algorithm in molecular similarity and diversity calculations on confidential datasets

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Summary

Chemical structure provides exhaustive description of a compound, but it is often proprietary and thus an impediment in the exchange of information. For example, structure disclosure is often needed for the selection of most similar or dissimilar compounds. Authors propose a centroidal algorithm based on structural fragments (screens) that can be efficiently used for the similarity and diversity selections without disclosing structures from the reference set. For an increased security purposes, authors recommend that such set contains at least some tens of structures. Analysis of reverse engineering feasibility showed that the problem difficulty grows with decrease of the screen’s radius. The algorithm is illustrated with concrete calculations on known steroidal, quinoline, and quinazoline drugs. We also investigate a problem of scaffold identification in combinatorial library dataset. The results show that relatively small screens of radius equal to 2 bond lengths perform well in the similarity sorting, while radius 4 screens yield better results in diversity sorting. The software implementation of the algorithm taking SDF file with a reference set generates screens of various radii which are subsequently used for the similarity and diversity sorting of external SDFs. Since the reverse engineering of the reference set molecules from their screens has the same difficulty as the RSA asymmetric encryption algorithm, generated screens can be stored openly without further encryption. This approach ensures an end user transfers only a set of structural fragments and no other data. Like other algorithms of encryption, the centroid algorithm cannot give 100% guarantee of protecting a chemical structure from dataset, but probability of initial structure identification is very small-order of 10−40 in typical cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Johnson, M.A. and Maggiora, G.M. (eds) (1990) Concepts and Applications of Molecular Similarity. Wiley, New York

    Google Scholar 

  2. MDL Information Systems, Inc., http://www.mdli.com/

  3. Daylight Chemical Information Systems, Inc., http://www.daylight.com/

  4. CambridgeSoft Corporation, http://www.camsoft.com/

  5. Oxford Molecular Ltd., http://www.oxmol.co.uk/

  6. Synopsys Scientific Systems Ltd., http://www.synopsys.co.uk/

  7. Tripos, Inc., http://www.tripos.com/

  8. Warr, W.A. Perspect. Drug Discovery Des., 7/8 (1997) 115

    CAS  Google Scholar 

  9. Brown, R.D. and Martin, Y.C., J. Chem. Inf. Comput. Sci., 36 (1996) 572

    Article  CAS  Google Scholar 

  10. Patterson, D.E., Cramer, R.D., Ferguson, A.M., Clark, R.D. and Weinberger, L.E., J. Med. Chem., 39 (1996) 3049

    Article  CAS  Google Scholar 

  11. Matter, H., J. Med. Chem., 40 (1997) 1219

    Article  CAS  Google Scholar 

  12. Adamson, G.W., Cowell, J., Lynch, M.F., McLure, A.H.W., Town, W.G. and Yapp, A.M., J. Chem. Doc., 13 (1973) 153

    Article  CAS  Google Scholar 

  13. Dittmar, P.G., Farmer, N.A., Fisanick, W., Haines R.C. and Mockus J., J. Chem. Inf. Comput. Sci., 23 (1983) 93

    CAS  Google Scholar 

  14. Carhart, R.E., Smith, D.H. and Venkataraghavan R., J. Chem. Inf. Comput. Sci., 25 (1985) 64

    CAS  Google Scholar 

  15. Nilakantan, R., Bauman, N., Dixon, J.S. and Venkataraghavan R., J. Chem. Inf. Comput. Sci., 27 (1987) 82

    Article  CAS  Google Scholar 

  16. Moreau, G. and Broto, P., Nouv. J. Chim., 4 (1980) 359

    CAS  Google Scholar 

  17. Dalby, A., Hourse, J.G., Hounshell, W.D., Gurchurst, A.K.I., Grier, D.L., Leland, B.A. and Laufer, J., J. Chem. Inf. Comput. Sci., 32 (1992) 244

    CAS  Google Scholar 

  18. Bremsler, W., Anal. Chim. Acta, 103 (1978) 355

    Article  Google Scholar 

  19. Figueras, J., J. Chem. Inf. Comput. Sci., 36 (1996) 986

    Article  CAS  Google Scholar 

  20. Trepalin, S.V., Gerasimenko, V.A., Kozyukov, A.V., Savchuk, N.Ph. and Ivaschenko, A.A., J. Chem. Inf. Comput. Sci., 42 (2002) 249

    Article  CAS  Google Scholar 

  21. Trepalin, S.V., Yarkov, A.V., Dolmatova, L.M., Zefirov, N.S. and Finch, S.A.E., J. Chem. Inf. Comput. Sci., 35 (1995) 405

    Article  CAS  Google Scholar 

  22. Smith, S.K., Cobleigh, J. and Svetnik V., J. Chem. Inf. Comput. Sci., 41 (2001) 1463

    Article  CAS  Google Scholar 

  23. Hu, C.-Y. and Xu I., J. Chem. Inf. Comput. Sci., 36 (1996) 82

    Article  CAS  Google Scholar 

  24. Willett, P., Barnard, J. M. and Downs G.M., J. Chem. Inf. Comput. Sci., 38 (1998) 983

    CAS  Google Scholar 

  25. Mount, J., Ruppert, J., Welch, W. and Jain, A.N., J. Med. Chem., 42 (1999) 60

    Article  CAS  Google Scholar 

  26. Holliday, J.D., Ranade, S.S. and Willett, P., Quant. Struct.-Act. Relat., 14 (1995) 501

    Article  CAS  Google Scholar 

  27. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pccompound. Accessed: October 2005

  28. http://www.chemnavigator.com. Accessed: October 2005

  29. MDL drug data report database, 2005

  30. CT file format. MDL report. August 2002, 1–64

  31. Gordon, J., Electronics Lett., 20 (1984) 514

    Article  Google Scholar 

Download references

Acknowledgements

Authors are thankful to Sergey Tkachenko, Caroline Williams, Alex Khvat, Nikolay Savchuk and Andrey Ivachtchenko for the productive discussions and their support of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolay Osadchiy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trepalin, S., Osadchiy, N. The centroidal algorithm in molecular similarity and diversity calculations on confidential datasets. J Comput Aided Mol Des 19, 715–729 (2005). https://doi.org/10.1007/s10822-005-9023-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-005-9023-1

Keywords

Navigation