Skip to main content

Advertisement

Log in

MolAlign: an algorithm for aligning multiple small molecules

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

In small molecule drug discovery projects, the receptor structure is not always available. In such cases it is enormously useful to be able to align known ligands in the way they bind in the receptor. Here we shall present an algorithm for the alignment of multiple small molecule ligands. This algorithm takes pre-generated conformers as input, and proposes aligned assemblies of the ligands. The algorithm consists of two stages: the first stage is to perform alignments for each pair of ligands, the second stage makes use of the results from the first stage to build up multiple ligand alignment assemblies using a novel iterative procedure. The scoring functions are improved versions of the one mentioned in our previous work. We have compared our results with some recent publications. While an exact comparison is impossible, it is clear that our algorithm is fast and produces very competitive results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967

    Article  CAS  Google Scholar 

  2. Lemmen C, Langauer T (2000) Computational methods for the structural alignment of molecules. J Comput Aided Mol Des 14:215–232

    Article  CAS  Google Scholar 

  3. Wolber G, Seidel T, Bendix F, Langer T (2008) Molecule-pharmacophore superpositioning and pattern matching in computational drug design. Drug Discov Today 13:23–39

    Article  CAS  Google Scholar 

  4. Leach AR, Gillet VJ, Lewis RA, Taylor R (2010) Three-dimensional pharmacophore methods in drug discovery. J Med Chem 53:539–558

    Article  CAS  Google Scholar 

  5. Jones G, Willett P, Glen RC (1995) A genetic algorithm for flexible molecular overlay and pharmacophore elucidation. J Comput Aided Mol Des 9:532–549

    Article  CAS  Google Scholar 

  6. Barnum D, Greene J, Smellie A, Sprague P (1996) Identification of common functional configurations among molecules. J Chem Inf Comput Sci 36:563–571

    Article  CAS  Google Scholar 

  7. Lemmen C, Lengauer T, Klebe G (1998) FLEXS: a method for fast flexible ligand superposition. J Med Chem 41:4502–4520

    Article  CAS  Google Scholar 

  8. Labute P, Williams C, Feher M, Sourial E, Schmidt JM (2001) Flexible alignment of small molecules. J Med Chem 44:1483–1490

    Article  CAS  Google Scholar 

  9. ROCS, from OpenEye Scientific Software, http://www.eyesopen.com

  10. Dixon SL, Smondyrev AM, Knoll EH, Rao SN, Shaw DE, Friesner RA (2006) PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des 20:647–671

    Article  CAS  Google Scholar 

  11. Gardiner EJ, Cosgrove DA, Taylor R, Gillet VJ (2009) Multiobjective optimization of pharmacophore hypotheses: bias toward low-energy conformations. J Chem Inf Model 49:2761–2773

    Article  CAS  Google Scholar 

  12. Jones G (2010) GAPE: an improved genetic algorithm for pharmacophore elucidation. J Chem Inf Model 50:2001–2018

    Article  CAS  Google Scholar 

  13. Taylor R, Cole JC, Cosgrove DA, Gardiner EJ, Gillet VJ, Korb O (2012) Development and validation of an improved algorithm for overlaying flexible molecules. J Comput Aided Mol Des 26:451–472

    Article  CAS  Google Scholar 

  14. Korb O, Monecke P, Hessler G, Stützle T, Exner TE (2010) pharmACOphore: multiple flexible ligand alignment based on ant colony optimization. J Chem Inf Model 50:1669–1681

    Article  CAS  Google Scholar 

  15. Moser D, Wittmann SK, Kramer J, Blöcher R, Achenbach J, Pogoryelov D, Proschak E (2015) PENG: a neural gas-based approach for pharmacophore elucidation. Method design, validation, and virtual screening for novel ligands of LTA4H. J Chem Inf Model 55:284–293

    Article  CAS  Google Scholar 

  16. Shin W, Hyun SA, Chae CH, Chon JK (2009) Flexible alignment of small molecules using the penalty method. J Chem Inf Model 49:1879–1888

    Article  CAS  Google Scholar 

  17. Tosco P, Balle T, Shiri F (2011) Open3DALIGN: an open-source software aimed at unsupervised ligand alignment. J Comput Aided Mol Des 25:777–783

    Article  CAS  Google Scholar 

  18. Diller DJ, Connell ND, Welsh WJ (2015) Avalanche for shape and feature-based virtual screening with 3D alignment. J Comput Aided Mol Des 29:1015–1024

    Article  CAS  Google Scholar 

  19. Kawabata T, Nakamura H (2014) 3D flexible alignment using 2D maximum common substructure: dependence of prediction accuracy on target-reference chemical similarity. J Chem Inf Model 54:1850–1863

    Article  CAS  Google Scholar 

  20. Vainio MJ, Puranen JS, Johnson MS (2009) ShaEP: molecular overlay based on shape and electrostatic potential. J Chem Inf Model 49:492–502

    Article  CAS  Google Scholar 

  21. Cross S, Baroni M, Goracci L, Cruciani G (2012) GRID-based three-dimensional pharmacophores I: FLAPpharm, a novel approach for pharmacophore elucidation. J Chem Inf Model 52:2587–2598

    Article  CAS  Google Scholar 

  22. Leherte L, Vercauteren DP (2012) Smoothed Gaussian molecular fields: an evaluation of molecular alignment problems. Theor Chem Acc 131:1259–1275

    Article  Google Scholar 

  23. Sastry GM, Dixon SL, Sherman W (2011) Rapid shape-based ligand alignment and virtual screening method based on atom/feature-pair similarities and volume overlap scoring. J Chem Inf Model 51:2455–2466

    Article  CAS  Google Scholar 

  24. Jones G, Gao Y, Sage CR (2009) Elucidating molecular overlays from pairwise alignments using a genetic algorithm. J Chem Inf Model 49:1847–1855

    Article  CAS  Google Scholar 

  25. Klabunde T, Giegerich C, Evers A (2012) MARS: computing three-dimensional alignments for multiple ligands using pairwise similarities. J Chem Inf Model 52:2022–2030

    Article  CAS  Google Scholar 

  26. Chan SL, Labute P (2010) Training a scoring function for the alignment of small molecules. J Chem Inf Model 50:1724–1735

    Article  CAS  Google Scholar 

  27. Bron C, Kerbosch J (1973) Algorithm 457—finding all cliques of an undirected graph. Commun ACM 16:575–577

    Article  Google Scholar 

  28. Kabsch W (1978) A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr A 34:827–828

    Article  Google Scholar 

  29. Cross S, Ortuso F, Baroni M, Costa G, Distinto S, Moraca F, Alcaro S, Cruciani G (2012) GRID-based three-dimensional pharmacophores II: pharmbench, a benchmark data set for evaluating pharmacophore elucidation methods. J Chem Inf Model 52:2599–2608

    Article  CAS  Google Scholar 

  30. Giangreco I, Cosgrove DA, Packer MJ (2013) An extensive and diverse set of molecular overlays for the validation of pharmacophore programs. J Chem Inf Model 53:852–866

    Article  CAS  Google Scholar 

  31. The Protein Data Bank. http://www.rcsb.org

  32. Giangreco I, Olsson TSG, Cole JC, Packer MJ (2014) Assessment of a cambridge structural database-driven overlay program. J Chem Inf Model 54:3091–3098

    Article  CAS  Google Scholar 

  33. The RDKit. http://www.rdkit.org, by Greg Landrum et al.

  34. Vainio MJ, Johnson MS (2007) Generating conformer ensembles using a multiobjective genetic algorithm. J Chem Inf Model 47:2462–2474

    Article  CAS  Google Scholar 

  35. Schärfer C, Schulz-Gasch T, Hert J, Heinzerling L, Schulz B, Inhester T, Stahl M, Rarey M (2013) CONFECT: conformations from an expert collection of torsion patterns. ChemMedChem 8:1690–1700

    Article  Google Scholar 

  36. Richmond NJ, Abrams CA, Wolohan PR, Abrahamian E, Willett P, Clark RD (2006) GALAHAD: 1. pharmacophore identification by hypermolecular alignment of ligands in 3D. J Comput Aided Mol Des 20:567–587

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The author would like to thank Dr. Christian Lemmen and the authors of Confect [35], and Dr. Mikko Vainio and other authors of Balloon [34]. This work would have been impossible without free access to these two conformer generators. Other indispensible freeware include the RDKit [33], the Yasara viewer (http://www.yasara.org), and the JMol Viewer (http://jmol.sourceforge.net). The author also thanks Drs. Alex Clark and Bill Myrtle for help with proof-reading.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shek Ling Chan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 361 KB)

Appendix

Appendix

Termination criterion for simultaneous multiple pose generation

After each cycle (each query triplet), the pair pose registers may be updated. The raw score improvement is defined to be the average (over all registers) of the improvement of the score at the fifth slot (i.e., the fifth best score in the register) over this cycle. The first slot is not used because it does not get updated very often (for the first slot to be updated, a new pose with a score better than any of the existing ones must be found). Position 5 is used because each register is guaranteed to have more than five slots, as per Eq. (2). In principle, we would terminate the iteration when the score improvement, SI, becomes too small. We noticed that some cycles produced very small SIs although there was still significant room for improvement. In order to smooth out the fluctuation, we average out the SI of the current cycle with the SIs of a few past cycles to create an effective SI, and use this to determine termination. Let \({n_{curr}}\) be the number of compatible triplets (which equals the number of conformer poses in the grand assembly) encountered in the current cycle. Let \({n_{avg}}\) be the hitherto average number of compatible triplets per cycle (i.e., \({n_{avg}}\) is the sum of all \({n_{curr}}\)′s encountered so far, divided by the number of cycles). The effective SI of the current cycle, \({s_{curr}}\), is given by

$${s_{curr}}=~\frac{{{n_{avg~}}{s_{prev}}~+~{n_{curr~}}{s_0}}}{{{n_{avg}}~+~{n_{curr}}}}$$
(3)

where \({s_{prev}}\) is the effective SI of the previous cycle, and \({s_0}\) is the raw score improvement, as described above. In other words, \({s_{curr}}\) is a weighted average of \({s_{prev}}\) and \({s_0}\). \({s_0}\) is weighted less if there are few compatible triplets in the current cycle. This should make the effective SI, \({s_{curr}}\), less prone to statistical fluctuations than the raw score improvement, \(~{s_0}\).

Iteration terminates when the effective SI drops below a cut-off value, \({s_{cut}}\). \({s_{cut}}\) is set to a value of 0.01. To put this in perspective, the pairwise score (according to any scoring scheme) of one hydrophobic atom overlapping on another is 2.0.

For the first cycle, the value of \({s_{prev}}\) in Eq. (3) is set to \({2^5} \times {s_{cut}}\). 5 is the index of the slot used for determining the raw score improvement, \({s_0}\), as mentioned above. This way, if \({s_0}\) is 0 and \({n_{curr}}\) remains unchanged over the cycles (implying that \({n_{avg}}\) always equals \({n_{curr}}\)), the iteration would go on for 5 cycles before terminating. Put in another way, if only one slot of each pair pose register is filled per iteration cycle, there is time for the register to be filled to the fifth slot, where \({s_0}\) is determined.

Choosing random query triplets for simultaneous multiple pose generation

Early in our research, we simply chose a random triplet from a random conformer of a random ligand. We found this inefficient. Since we would like each pair pose register to be at least partially filled, we can bias the choice of the ligands to encourage ligands with empty registers to be chosen. First, ligands having fewer than three representative points are not involved, because they do not have representative triplets. An array is constructed out of the indices of all ligands having at least three representative points. Each index appears once. We then go through all ligand pairs. For each ligand pair whose pair pose register is still empty, three copies of the index of each ligand in the pair are inserted into the array. (Experiments showed that using five instead of three has little effect on the results.) Finally, a random element from this array is selected to give a random ligand. Then a random conformer of this ligand is chosen, and a random triplet of representative points of this conformer is chosen. Similar to the systematic pose generation method, triplets that are geometrically too small are never chosen. The diameter (see “Systematic pose generation” section) of a chosen triplet has to be at least half that of the smallest diameter amongst the representative point sets of all conformers of all ligands.

Starting points for multiple ligand alignment

As described in the section “Generating multiple ligand assemblies”, one base template conformer gives rise to one starting point assembly, and subsequently one solution assembly. We discovered that using a maximum of 250 starting points strikes a balance between good results and speed. Before the iteration begins, each conformer of each ligand is used as a base template to generate a starting assembly. Given one base template conformer, for each of the other ligands, the best scoring pairwise alignment pose between the ligand and the base template is used to build the starting assembly. It is possible that no pairwise alignment pose exists in the pair pose register between the base template conformer and the ligand. If this is the case, the missing ligand count of this base template will be increased by one. After all the non-missing ligands have been positioned in the assembly, each missing ligand will be positioned using the non-missing ligands as helpers, one at a time. For each missing ligand, amongst all poses generated using the helpers, the one resulting in the best score for the assembly (all non-missing ligands plus the current ligand) is used. Any ligands still missing after this step will be positioned randomly. The multiple ligand alignment score of the starting point assembly will then be calculated, using fast scoring. The full list of starting point assemblies will then be sorted, first by the missing ligand count, then by the multiple ligand alignment score. If there are more than 250 assemblies, the 250 with the smallest missing ligand counts and the best scores are used as starting points. In a real run, there is of course no need to patch the missing ligands if there are already enough starting points without any missing ligands.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chan, S.L. MolAlign: an algorithm for aligning multiple small molecules. J Comput Aided Mol Des 31, 523–546 (2017). https://doi.org/10.1007/s10822-017-0023-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-017-0023-8

Keywords

Navigation