MolAlign: an algorithm for aligning multiple small molecules

Chan, Shek Ling

doi:10.1007/s10822-017-0023-8

MolAlign: an algorithm for aligning multiple small molecules

Published: 01 June 2017

Volume 31, pages 523–546, (2017)
Cite this article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Shek Ling Chan ORCID: orcid.org/0000-0001-7062-1134¹

913 Accesses
10 Citations
Explore all metrics

Abstract

In small molecule drug discovery projects, the receptor structure is not always available. In such cases it is enormously useful to be able to align known ligands in the way they bind in the receptor. Here we shall present an algorithm for the alignment of multiple small molecule ligands. This algorithm takes pre-generated conformers as input, and proposes aligned assemblies of the ligands. The algorithm consists of two stages: the first stage is to perform alignments for each pair of ligands, the second stage makes use of the results from the first stage to build up multiple ligand alignment assemblies using a novel iterative procedure. The scoring functions are improved versions of the one mentioned in our previous work. We have compared our results with some recent publications. While an exact comparison is impossible, it is clear that our algorithm is fast and produces very competitive results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes

Article 29 August 2022

Sunsetting Binding MOAD with its last data update and the addition of 3D-ligand polypharmacology tools

Article Open access 21 February 2023

ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs

Article Open access 13 March 2017

References

Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110:5959–5967
Article CAS Google Scholar
Lemmen C, Langauer T (2000) Computational methods for the structural alignment of molecules. J Comput Aided Mol Des 14:215–232
Article CAS Google Scholar
Wolber G, Seidel T, Bendix F, Langer T (2008) Molecule-pharmacophore superpositioning and pattern matching in computational drug design. Drug Discov Today 13:23–39
Article CAS Google Scholar
Leach AR, Gillet VJ, Lewis RA, Taylor R (2010) Three-dimensional pharmacophore methods in drug discovery. J Med Chem 53:539–558
Article CAS Google Scholar
Jones G, Willett P, Glen RC (1995) A genetic algorithm for flexible molecular overlay and pharmacophore elucidation. J Comput Aided Mol Des 9:532–549
Article CAS Google Scholar
Barnum D, Greene J, Smellie A, Sprague P (1996) Identification of common functional configurations among molecules. J Chem Inf Comput Sci 36:563–571
Article CAS Google Scholar
Lemmen C, Lengauer T, Klebe G (1998) FLEXS: a method for fast flexible ligand superposition. J Med Chem 41:4502–4520
Article CAS Google Scholar
Labute P, Williams C, Feher M, Sourial E, Schmidt JM (2001) Flexible alignment of small molecules. J Med Chem 44:1483–1490
Article CAS Google Scholar
ROCS, from OpenEye Scientific Software, http://www.eyesopen.com
Dixon SL, Smondyrev AM, Knoll EH, Rao SN, Shaw DE, Friesner RA (2006) PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des 20:647–671
Article CAS Google Scholar
Gardiner EJ, Cosgrove DA, Taylor R, Gillet VJ (2009) Multiobjective optimization of pharmacophore hypotheses: bias toward low-energy conformations. J Chem Inf Model 49:2761–2773
Article CAS Google Scholar
Jones G (2010) GAPE: an improved genetic algorithm for pharmacophore elucidation. J Chem Inf Model 50:2001–2018
Article CAS Google Scholar
Taylor R, Cole JC, Cosgrove DA, Gardiner EJ, Gillet VJ, Korb O (2012) Development and validation of an improved algorithm for overlaying flexible molecules. J Comput Aided Mol Des 26:451–472
Article CAS Google Scholar
Korb O, Monecke P, Hessler G, Stützle T, Exner TE (2010) pharmACOphore: multiple flexible ligand alignment based on ant colony optimization. J Chem Inf Model 50:1669–1681
Article CAS Google Scholar
Moser D, Wittmann SK, Kramer J, Blöcher R, Achenbach J, Pogoryelov D, Proschak E (2015) PENG: a neural gas-based approach for pharmacophore elucidation. Method design, validation, and virtual screening for novel ligands of LTA4H. J Chem Inf Model 55:284–293
Article CAS Google Scholar
Shin W, Hyun SA, Chae CH, Chon JK (2009) Flexible alignment of small molecules using the penalty method. J Chem Inf Model 49:1879–1888
Article CAS Google Scholar
Tosco P, Balle T, Shiri F (2011) Open3DALIGN: an open-source software aimed at unsupervised ligand alignment. J Comput Aided Mol Des 25:777–783
Article CAS Google Scholar
Diller DJ, Connell ND, Welsh WJ (2015) Avalanche for shape and feature-based virtual screening with 3D alignment. J Comput Aided Mol Des 29:1015–1024
Article CAS Google Scholar
Kawabata T, Nakamura H (2014) 3D flexible alignment using 2D maximum common substructure: dependence of prediction accuracy on target-reference chemical similarity. J Chem Inf Model 54:1850–1863
Article CAS Google Scholar
Vainio MJ, Puranen JS, Johnson MS (2009) ShaEP: molecular overlay based on shape and electrostatic potential. J Chem Inf Model 49:492–502
Article CAS Google Scholar
Cross S, Baroni M, Goracci L, Cruciani G (2012) GRID-based three-dimensional pharmacophores I: FLAPpharm, a novel approach for pharmacophore elucidation. J Chem Inf Model 52:2587–2598
Article CAS Google Scholar
Leherte L, Vercauteren DP (2012) Smoothed Gaussian molecular fields: an evaluation of molecular alignment problems. Theor Chem Acc 131:1259–1275
Article Google Scholar
Sastry GM, Dixon SL, Sherman W (2011) Rapid shape-based ligand alignment and virtual screening method based on atom/feature-pair similarities and volume overlap scoring. J Chem Inf Model 51:2455–2466
Article CAS Google Scholar
Jones G, Gao Y, Sage CR (2009) Elucidating molecular overlays from pairwise alignments using a genetic algorithm. J Chem Inf Model 49:1847–1855
Article CAS Google Scholar
Klabunde T, Giegerich C, Evers A (2012) MARS: computing three-dimensional alignments for multiple ligands using pairwise similarities. J Chem Inf Model 52:2022–2030
Article CAS Google Scholar
Chan SL, Labute P (2010) Training a scoring function for the alignment of small molecules. J Chem Inf Model 50:1724–1735
Article CAS Google Scholar
Bron C, Kerbosch J (1973) Algorithm 457—finding all cliques of an undirected graph. Commun ACM 16:575–577
Article Google Scholar
Kabsch W (1978) A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr A 34:827–828
Article Google Scholar
Cross S, Ortuso F, Baroni M, Costa G, Distinto S, Moraca F, Alcaro S, Cruciani G (2012) GRID-based three-dimensional pharmacophores II: pharmbench, a benchmark data set for evaluating pharmacophore elucidation methods. J Chem Inf Model 52:2599–2608
Article CAS Google Scholar
Giangreco I, Cosgrove DA, Packer MJ (2013) An extensive and diverse set of molecular overlays for the validation of pharmacophore programs. J Chem Inf Model 53:852–866
Article CAS Google Scholar
The Protein Data Bank. http://www.rcsb.org
Giangreco I, Olsson TSG, Cole JC, Packer MJ (2014) Assessment of a cambridge structural database-driven overlay program. J Chem Inf Model 54:3091–3098
Article CAS Google Scholar
The RDKit. http://www.rdkit.org, by Greg Landrum et al.
Vainio MJ, Johnson MS (2007) Generating conformer ensembles using a multiobjective genetic algorithm. J Chem Inf Model 47:2462–2474
Article CAS Google Scholar
Schärfer C, Schulz-Gasch T, Hert J, Heinzerling L, Schulz B, Inhester T, Stahl M, Rarey M (2013) CONFECT: conformations from an expert collection of torsion patterns. ChemMedChem 8:1690–1700
Article Google Scholar
Richmond NJ, Abrams CA, Wolohan PR, Abrahamian E, Willett P, Clark RD (2006) GALAHAD: 1. pharmacophore identification by hypermolecular alignment of ligands in 3D. J Comput Aided Mol Des 20:567–587
Article CAS Google Scholar

Download references

Acknowledgements

The author would like to thank Dr. Christian Lemmen and the authors of Confect [35], and Dr. Mikko Vainio and other authors of Balloon [34]. This work would have been impossible without free access to these two conformer generators. Other indispensible freeware include the RDKit [33], the Yasara viewer (http://www.yasara.org), and the JMol Viewer (http://jmol.sourceforge.net). The author also thanks Drs. Alex Clark and Bill Myrtle for help with proof-reading.

Author information

Authors and Affiliations

Cascade Scientific, Vancouver, B.C., V6L 1M3, Canada
Shek Ling Chan

Authors

Shek Ling Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shek Ling Chan.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 361 KB)

Appendix

Termination criterion for simultaneous multiple pose generation

After each cycle (each query triplet), the pair pose registers may be updated. The raw score improvement is defined to be the average (over all registers) of the improvement of the score at the fifth slot (i.e., the fifth best score in the register) over this cycle. The first slot is not used because it does not get updated very often (for the first slot to be updated, a new pose with a score better than any of the existing ones must be found). Position 5 is used because each register is guaranteed to have more than five slots, as per Eq. (2). In principle, we would terminate the iteration when the score improvement, SI, becomes too small. We noticed that some cycles produced very small SIs although there was still significant room for improvement. In order to smooth out the fluctuation, we average out the SI of the current cycle with the SIs of a few past cycles to create an effective SI, and use this to determine termination. Let ${n_{curr}}$ be the number of compatible triplets (which equals the number of conformer poses in the grand assembly) encountered in the current cycle. Let ${n_{avg}}$ be the hitherto average number of compatible triplets per cycle (i.e., ${n_{avg}}$ is the sum of all ${n_{curr}}$′s encountered so far, divided by the number of cycles). The effective SI of the current cycle, ${s_{curr}}$, is given by

$${s_{curr}}=~\frac{{{n_{avg~}}{s_{prev}}~+~{n_{curr~}}{s_0}}}{{{n_{avg}}~+~{n_{curr}}}}$$

(3)

where ${s_{prev}}$ is the effective SI of the previous cycle, and ${s_0}$ is the raw score improvement, as described above. In other words, ${s_{curr}}$ is a weighted average of ${s_{prev}}$ and ${s_0}$. ${s_0}$ is weighted less if there are few compatible triplets in the current cycle. This should make the effective SI, ${s_{curr}}$, less prone to statistical fluctuations than the raw score improvement, $~{s_0}$.

Iteration terminates when the effective SI drops below a cut-off value, ${s_{cut}}$. ${s_{cut}}$ is set to a value of 0.01. To put this in perspective, the pairwise score (according to any scoring scheme) of one hydrophobic atom overlapping on another is 2.0.

For the first cycle, the value of ${s_{prev}}$ in Eq. (3) is set to ${2^5} \times {s_{cut}}$. 5 is the index of the slot used for determining the raw score improvement, ${s_0}$, as mentioned above. This way, if ${s_0}$ is 0 and ${n_{curr}}$ remains unchanged over the cycles (implying that ${n_{avg}}$ always equals ${n_{curr}}$), the iteration would go on for 5 cycles before terminating. Put in another way, if only one slot of each pair pose register is filled per iteration cycle, there is time for the register to be filled to the fifth slot, where ${s_0}$ is determined.

Choosing random query triplets for simultaneous multiple pose generation

Early in our research, we simply chose a random triplet from a random conformer of a random ligand. We found this inefficient. Since we would like each pair pose register to be at least partially filled, we can bias the choice of the ligands to encourage ligands with empty registers to be chosen. First, ligands having fewer than three representative points are not involved, because they do not have representative triplets. An array is constructed out of the indices of all ligands having at least three representative points. Each index appears once. We then go through all ligand pairs. For each ligand pair whose pair pose register is still empty, three copies of the index of each ligand in the pair are inserted into the array. (Experiments showed that using five instead of three has little effect on the results.) Finally, a random element from this array is selected to give a random ligand. Then a random conformer of this ligand is chosen, and a random triplet of representative points of this conformer is chosen. Similar to the systematic pose generation method, triplets that are geometrically too small are never chosen. The diameter (see “Systematic pose generation” section) of a chosen triplet has to be at least half that of the smallest diameter amongst the representative point sets of all conformers of all ligands.

Starting points for multiple ligand alignment

As described in the section “Generating multiple ligand assemblies”, one base template conformer gives rise to one starting point assembly, and subsequently one solution assembly. We discovered that using a maximum of 250 starting points strikes a balance between good results and speed. Before the iteration begins, each conformer of each ligand is used as a base template to generate a starting assembly. Given one base template conformer, for each of the other ligands, the best scoring pairwise alignment pose between the ligand and the base template is used to build the starting assembly. It is possible that no pairwise alignment pose exists in the pair pose register between the base template conformer and the ligand. If this is the case, the missing ligand count of this base template will be increased by one. After all the non-missing ligands have been positioned in the assembly, each missing ligand will be positioned using the non-missing ligands as helpers, one at a time. For each missing ligand, amongst all poses generated using the helpers, the one resulting in the best score for the assembly (all non-missing ligands plus the current ligand) is used. Any ligands still missing after this step will be positioned randomly. The multiple ligand alignment score of the starting point assembly will then be calculated, using fast scoring. The full list of starting point assemblies will then be sorted, first by the missing ligand count, then by the multiple ligand alignment score. If there are more than 250 assemblies, the 250 with the smallest missing ligand counts and the best scores are used as starting points. In a real run, there is of course no need to patch the missing ligands if there are already enough starting points without any missing ligands.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chan, S.L. MolAlign: an algorithm for aligning multiple small molecules. J Comput Aided Mol Des 31, 523–546 (2017). https://doi.org/10.1007/s10822-017-0023-8

Download citation

Received: 12 January 2017
Accepted: 08 May 2017
Published: 01 June 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10822-017-0023-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MolAlign: an algorithm for aligning multiple small molecules

Abstract

Access this article

Similar content being viewed by others

US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes

Sunsetting Binding MOAD with its last data update and the addition of 3D-ligand polypharmacology tools

ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 361 KB)

Appendix

Termination criterion for simultaneous multiple pose generation

Choosing random query triplets for simultaneous multiple pose generation

Starting points for multiple ligand alignment

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MolAlign: an algorithm for aligning multiple small molecules

Abstract

Access this article

Similar content being viewed by others

US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes

Sunsetting Binding MOAD with its last data update and the addition of 3D-ligand polypharmacology tools

ForceGen 3D structure and conformer generation: from small lead-like molecules to macrocyclic drugs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (DOCX 361 KB)

Appendix

Appendix

Termination criterion for simultaneous multiple pose generation

Choosing random query triplets for simultaneous multiple pose generation

Starting points for multiple ligand alignment

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation