Skip to main content
Log in

Testing Different Nonsmooth Formulations of the Lennard–Jones Potential in Atomic Clustering Problems

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

A cluster is a group of identical molecules or atoms loosely bound by inter-atomic forces. The optimal geometry minimises the potential energy—usually modelled as the Lennard–Jones potential—of the cluster. The minimisation of the Lennard–Jones potential is a very difficult global optimisation problem with extremely many local minima. In addition to cluster problems, the Lennard–Jones potential represents an important component in many of the potential energy models used, for example, in protein folding, protein–peptide docking, and complex molecular conformation problems. In this paper, we study different modifications of the Lennard–Jones potential in order to improve the success rate of finding the global minimum of the original potential. The main interest of the paper is in nonsmooth penalised form of the Lennard–Jones potential. The preliminary numerical experiments confirm that the success rate of finding the global minimum is clearly improved when using the new formulae.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We also tried a cluster \(N=75\) that is considered to be a difficult cluster to solve (see, e.g. [7]), but with our randomly selected starting points, no trial led to the putative global minimum.

  2. The minimum obtained with \(N=13\) was smaller than that given in [19].

References

  1. Kostrowicki, J., Piela, L., Cherayil, B.J., Scheraga, H.A.: Performance of the diffusion equation method in searches for optimum structures of clusters of Lennard–Jones atoms. J. Phys. Chem. 95, 4113–4119 (1991)

    Article  Google Scholar 

  2. Wales, D.J.: Rearrangements of 55-atom Lennard–Jones and (C60)55 clusters. J. Chem. Phys. 101, 3750–3762 (1994)

    Article  Google Scholar 

  3. Yeak, S.H., Ng, T.Y., Liew, K.M.: Multiscale modeling of carbon nanotubes under axial tension and compression. Phys. Rev. B 72(16), 165401 (2005). doi:10.1103/PhysRevB.72.165401

  4. Lampariello, F., Liuzzi, G.: Global optimization of protein-peptide docking by a filling function method. J. Optim. Theory Appl. 164, 1090–1108 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  5. Leach, A.R.: Molecular Modelling: Principles and Applications, 2nd edn. Pearson Education Limited, Harlow (2001)

    Google Scholar 

  6. Neumaier, A.: Molecular modeling of proteins and mathematical prediction of protein structure. SIAM Rev. 39, 407–460 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  7. Locatelli, M., Schoen, F.: Fast global optimization of difficult Lennard–Jones clusters. Comput. Optim. Appl. 21, 55–70 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Beliakov, G., Monsalve Tobon, J.E., Bagirov, A.M.: Parallelization of the discrete gradient method of non-smooth optimization and its applications. In: Sloot, et al. (eds.) Computational Science—ICCS 2003, Lecture Notes in Computer Science, pp. 592–601. Springer, Berlin (2003)

  9. Bagirov, A.M., Karasozen, B., Sezer, M.: Discrete gradient method: a derivative free method for nonsmooth optimization. J. Optim. Theory Appl. 137, 317–334 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  10. Karmitsa, N., Bagirov, A.: Limited memory discrete gradient bundle method for nonsmooth derivative free optimization. Optim. J. Math. Program. Oper. Res. 61(12), 1491–1509 (2012)

    MathSciNet  MATH  Google Scholar 

  11. Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19(6), 673–692 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  12. Haarala, N., Miettinen, K., Mäkelä, M.M.: Globally convergent limited memory bundle method for large-scale nonsmooth optimization. Math. Program. 109(1), 181–205 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley-Interscience, New York (1983)

    MATH  Google Scholar 

  14. Bagirov, A.M., Karmitsa, N., Mäkelä, M.M.: Introduction to Nonsmooth Optimization: Theory, Practice and Software. Springer, Berlin (2014)

    Book  MATH  Google Scholar 

  15. Karmitsa, N., Bagirov, A., Mäkelä, M.M.: Comparing different nonsmooth optimization methods and software. Optim. Methods Softw. 27(1), 131–153 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  16. Doye, J.P.K.: The effect of compression on the global optimization of atomic clusters. Phys. Rev. E 62, 8753–8761 (2000)

    Article  Google Scholar 

  17. Locatelli, M., Schoen, F.: Efficient algorithms for large scale global optimization. Comput. Optim. Appl. 26, 173–190 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Vlček, J., Lukšan, L.: Globally convergent variable metric method for nonconvex nondifferentiable unconstrained minimization. J. Optim. Theory Appl. 111(2), 407–430 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  19. Leary, R.H.: Global optima of Lennard–Jones clusters. J. Global Optim. 11, 35–53 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  20. Bagirov, A.M., Ugon, J., Mirzayeva, H.G.: Nonsmooth optimization algorithm for solving clusterwise linear regression problem. J. Optim. Theory Appl. 164, 755–780 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  21. Lemaréchal, C., Strodiot, J.J., Bihain, A.: On a bundle algorithm for nonsmooth optimization. In: Mangasarian, O.L., Mayer, R.R., Robinson, S.M. (eds.) Nonlinear Programming, pp. 245–281. Academic Press, New York (1981)

    Google Scholar 

  22. Mifflin, R.: A modification and an extension of Lemaréchal’s algorithm for nonsmooth minimization. Math. Program. Study 17, 77–90 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  23. Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-Newton matrices and their use in limited memory methods. Math. Program. 63, 129–156 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  24. Haarala, M.: Large-Scale Nonsmooth Optimization: Variable Metric Bundle Method with Limited Memory. Ph.D. thesis, University of Jyväskylä, Department of Mathematical Information Technology (2004)

Download references

Acknowledgments

The work was financially supported by the University of Turku (Finland) and the Academy of Finland (Project No. 289500).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Napsu Karmitsa.

Additional information

Communicated by Christodoulos Floudas.

Appendix

Appendix

Limited Memory Discrete Gradient Bundle Method. We now describe the basic ideas of derivative-free LDGB that is used as a solver for the minimisation problem discussed in the paper. As said in the introduction, we need a solver that is capable of solving large-scale nonsmooth nonconvex problems. Moreover, the computation of subgradients is not straightforward since the problem is subdifferentially irregular and, thus, we need a derivative-free solver. The only assumptions made are that the objective function is locally Lipschitz continuous, and at every point , we can evaluate the value of the objective function \(f({\mathbf {x}})\). A flow chart of the method is given in Fig. 4.

Discrete Gradient. We start by defining the discrete gradient. Let us denote by

the sphere of the unit ball and by

the set of univariate positive infinitesimal functions. In addition, let

be a set of all vertices of the unit hypercube in .

Now, take any \({\mathbf {g}}\in S_1\), \({\mathbf {e}}\in G\), \(z \in P\), \(\alpha \in (0,1]\), and compute \(i = \text{ argmax }\,\{|g_j|, ~j=1,\ldots ,n\}\). For \({\mathbf {e}}\in G\), define the sequence of n vectors \({\mathbf {e}}^j(\alpha ) := (\alpha e_1,\alpha ^2 e_2,\ldots ,\alpha ^j e_j,0,\ldots ,0)\), \(j=1,\ldots ,n\) and for and \(\delta > 0\) consider the points

$$\begin{aligned} {\mathbf {x}}_0={\mathbf {x}}+ \delta {\mathbf {g}}, \qquad {\mathbf {x}}_j = {\mathbf {x}}_0 + z(\delta ) {\mathbf {e}}^j(\alpha ), \qquad j=1,\ldots ,n. \end{aligned}$$

Definition 7.1

The discrete gradient of the function at the point is the vector with the following coordinates:

$$\begin{aligned} \Gamma _j^i&:= [z(\delta )\alpha ^j e_j)]^{-1} \left[ f({\mathbf {x}}_j) - f({\mathbf {x}}_{j-1}) \right] , \qquad j = 1,\ldots ,n,~~j \ne i,\\ \Gamma _i^i&:= (\delta g_i)^{-1} \left[ f({\mathbf {x}}+\delta {\mathbf {g}})- f({\mathbf {x}})- \delta \sum _{j=1,j\ne i}^n \Gamma _j^i g_j \right] . \end{aligned}$$

The closed convex set of discrete gradients

is an approximation to the subdifferential \(\partial f({\mathbf {x}})\) for sufficiently small \(\delta > 0\) [9, 14].

Outer and Inner Iterations. Both outer and inner iterations are used in the LDGB: The inner iteration of the LDGB is essentially same as the limited memory bundle method [11, 12], but now we use the discrete gradients instead of subgradient of the objective function. The outer iteration is used in order to avoid too tight approximations to the subgradients at the beginning of computation (thus, we have a derivative-free method). That is, we start with “large” \(\delta \) and make it smaller when we are closer to the optimum.

Search Direction. As already said, we use the discrete gradients instead of subgradient in our calculations, and the search direction \({\mathbf {d}}_k\) is calculated using the limited memory approach. That is,

$$\begin{aligned} {\mathbf {d}}_k= -D^k \tilde{{\mathbf {v}}}_k, \end{aligned}$$

where \(\tilde{{\mathbf {v}}}_k\) is (an aggregate) discrete gradient, and \(D^k\) is the limited memory variable metric update that, in the smooth case, represents the approximation of the inverse of the Hessian matrix. Note that the matrix \(D^k\) is not formed explicitly, but the search direction \({\mathbf {d}}_k\) is calculated using the limited memory approach (to be described later).

Line Search. In order to determine a new step into the search direction \({\mathbf {d}}_k\), the LDGB uses the so-called line search procedure (see [12, 18]): a new iteration point \({\mathbf {x}}_{k+1}\) and a new auxiliary point \({\mathbf {y}}_{k+1}\) are produced such that

figure a

with \({\mathbf {y}}_1={\mathbf {x}}_1\), where \(t_R^k \in (0,t_{max}]\) and \(t_L^k \in [0,t_R^k]\) are step sizes, and \(t_{max}>1\) is the upper bound for the step size. A necessary condition for a serious step is to have

$$\begin{aligned} t_R^k = t_L^k > 0 \qquad \text {and} \qquad f({\mathbf {y}}_{k+1}) \le f({\mathbf {x}}_k) - \varepsilon _L^k t_R^k w_k, \end{aligned}$$
(8)

where \(\varepsilon _L^k \in (0,1/2)\) is a line search parameter, and \(w_k>0\) represents the desirable amount of descent of f at \({\mathbf {x}}_k\). If the condition (8) is satisfied, we set \({\mathbf {x}}_{k+1} = {\mathbf {y}}_{k+1}\) and a serious step is taken.

On the other hand, a null step is taken if

$$\begin{aligned} t_R^k > t_L^k = 0 \qquad \text {and} \qquad - \beta _{k+1} + {\mathbf {d}}_k^T {\mathbf {v}}_{k + 1} \ge - \varepsilon _R^k w_k, \end{aligned}$$

where \(\varepsilon _R^k \in (\varepsilon _L^k,1/2)\) is a line search parameter and \({\mathbf {v}}_{k+1} \in V_0({\mathbf {y}}_{k+1},\delta _k)\). Moreover, \(\beta _{k+1}\) is the subgradient locality measure [21, 22] similar to standard bundle methods, that is,

$$\begin{aligned} \beta _{k+1}&:= \max \left\{ |(f({\mathbf {x}}_k) - f({\mathbf {y}}_{k+1}) + ({\mathbf {y}}_{k+1} - {\mathbf {x}}_k)^T {\mathbf {v}}_{k+1})|,\, \gamma ||{\mathbf {y}}_{k+1} - {\mathbf {x}}_k||^2 \, \right\} . \end{aligned}$$

Here, \(\gamma \ge 0\) is a distance measure parameter supplied by the user. Parameter \(\gamma \) can be set to zero when f is convex. In the case of a null step, we set \({\mathbf {x}}_{k+1} = {\mathbf {x}}_k\), but information about the objective function is increased because we store the auxiliary point \({\mathbf {y}}_{k+1}\) and the corresponding auxiliary discrete gradient \({\mathbf {v}}_{k+1} \in V_0({\mathbf {y}}_{k+1},\delta _k)\).

Under some semismoothness assumptions, the line search procedure used with the LDGB is guaranteed to find the step sizes \(t_L^k\) and \(t_R^k\) such that exactly one of the two possibilities—a serious step or a null step—occurs [18].

Aggregation. The LDGB uses the original discrete gradient \({\mathbf {v}}_k\) after the serious step and the aggregate subgradient \(\tilde{{\mathbf {v}}}_k\) after the null step for direction finding (i.e. we set \(\tilde{{\mathbf {v}}}_k={\mathbf {v}}_k\) if the previous step was a serious step). The aggregation procedure is carried out by determining multipliers \(\lambda ^k_{i}\) satisfying \(\lambda ^k_{i} \ge 0\) for all \(i \in \{1,2,3\}\), and \(\sum _{i=1}^3 \lambda ^k_{i}=1\) that minimise a simple quadratic function

$$\begin{aligned} \varphi (\lambda _1, \lambda _2, \lambda _3)&= [\lambda _1 {\mathbf {v}}_m + \lambda _2 {\mathbf {v}}_{k+1} + \lambda _3\tilde{{\mathbf {v}}}_k\,]^T D^k [\lambda _1 {\mathbf {v}}_m + \lambda _2 {\mathbf {v}}_{k+1} + \lambda _3 \tilde{{\mathbf {v}}}_k\,] \\&\qquad + 2(\lambda _2 \beta _{k+1} + \lambda _3 \tilde{\beta }_k).\nonumber \end{aligned}$$

Here, \({\mathbf {v}}_m \in V_0({\mathbf {x}}_k,\delta _k)\) is the current discrete gradient (m denotes the index of the iteration after the latest serious step, i.e. \({\mathbf {x}}_k = {\mathbf {x}}_m\)), \({\mathbf {v}}_{k+1} \in V_0({\mathbf {y}}_{k+1},\delta _k)\) is the auxiliary discrete gradient, and \(\tilde{{\mathbf {v}}}_k\) is the current aggregate discrete gradient from the previous iteration (\(\tilde{{\mathbf {v}}}_1={\mathbf {v}}_1\)). In addition, \(\beta _{k+1}\) is the current subgradient locality measure, and \(\tilde{\beta }_k\) is the current aggregate subgradient locality measure (\(\tilde{\beta }_1=0\)). The optimal values \(\lambda ^k_{i}\), \(i \in \{1,2,3\}\) can be calculated by using simple formulae (see [18]).

The resulting aggregate discrete gradient \(\tilde{{\mathbf {v}}}_{k+1}\) and aggregate subgradient locality measure \(\tilde{\beta }_{k+1}\) are computed by the formulae

$$\begin{aligned} \tilde{{\mathbf {v}}}_{k+1} = \lambda ^k_{1} {\mathbf {v}}_m + \lambda ^k_{2} {\mathbf {v}}_{k+1} + \lambda ^k_{3} \tilde{{\mathbf {v}}}_k \qquad \text {and} \qquad \tilde{\beta }_{k+1} = \lambda ^k_{2} \beta _{k+1} + \lambda ^k_{3} \tilde{\beta }_k. \end{aligned}$$

Due to this simple aggregation procedure, only one trial point \({\mathbf {y}}_{k+1}\) and the corresponding discrete gradient \({\mathbf {v}}_{k+1} \in V_0({\mathbf {y}}_{k+1},\delta _k)\) need to be stored.

The aggregation procedure gives us a possibility to retain the global convergence without solving the quite complicated quadratic direction finding problem (see, e.g. [14]) appearing in standard bundle methods. Note that the aggregate values are computed only if the last step was a null step. Otherwise, we set \(\tilde{{\mathbf {v}}}_{k+1} = {\mathbf {v}}_{k+1}\) and \(\tilde{\beta }_{k+1} =0\).

Matrix Updating. In the LDGB , both the limited memory BFGS (L-BFGS) and the limited memory SR1 (L-SR1) update formulae [23] are used in calculations of the search direction and the aggregate values. The idea of limited memory matrix updating is that instead of storing large \(n \times n\) matrices \(D^k\), one stores a certain (usually small) number of vectors \({\mathbf {s}}_k= {\mathbf {y}}_{k+1}-{\mathbf {x}}_k\) and \({\mathbf {u}}_k={\mathbf {v}}_{k+1}-{\mathbf {v}}_m\) obtained at the previous iterations of the algorithm, and uses these vectors to implicitly define the variable metric matrices. Note that, due to the usage of null steps, we may have \({\mathbf {x}}_{k+1}={\mathbf {x}}_k\), and thus, we use here the auxiliary point \({\mathbf {y}}_{k+1}\) instead of \({\mathbf {x}}_{k+1}\).

Let us denote by \(\hat{m}_c\) the user-specified maximum number of stored correction vectors (\(3 \le \hat{m}_c\)) and by \(\hat{m}_k = \min \,\{\,k-1,\hat{m}_c\,\}\) the current number of stored correction vectors. Then, the \(n \times \hat{m}_k\) dimensional correction matrices \(S_k\) and \(U_k\) are defined by

$$\begin{aligned} S_k&:=\begin{bmatrix}{\mathbf {s}}_{k-\hat{m}_k}&\ldots&{\mathbf {s}}_{k-1}\end{bmatrix} \qquad \text {and} \\ U_k&:=\begin{bmatrix}{\mathbf {u}}_{k-\hat{m}_k}&\ldots&{\mathbf {u}}_{k-1}\end{bmatrix}. \nonumber \end{aligned}$$

The inverse L-BFGS update is defined by the formula

$$\begin{aligned} D^k:=\vartheta _k I +\begin{bmatrix}S_k&\vartheta _k U_k\end{bmatrix}\begin{bmatrix} (R_k^{-1})^T (C_k+\vartheta _k U_k^T U_k)R_k^{-1}&-(R_k^{-1})^T\\ -R_k^{-1}&0 \end{bmatrix}\begin{bmatrix}S_k^T \\ \vartheta _k U_k^T \end{bmatrix}, \end{aligned}$$

where \(R_k\) is an upper triangular matrix of order \(\hat{m}_k\) given by the form

$$\begin{aligned} (R_k)_{ij} = {\left\{ \begin{array}{ll} ({\mathbf {s}}_{k-\hat{m}_k-1+i})^T({\mathbf {u}}_{k-\hat{m}_k-1+j}), &{}\text {if } \,\, i \le j\\ 0, &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

\(C_k\) is a diagonal matrix of order \(\hat{m}_k\) such that

$$\begin{aligned} C_k = {{\mathrm{diag}}}\,\left[ {\mathbf {s}}_{k-\hat{m}_k}^T {\mathbf {u}}_{k-\hat{m}_k}, \ldots , {\mathbf {s}}_{k-1}^T {\mathbf {u}}_{k-1}\right] , \end{aligned}$$

and \(\vartheta _k\) is a positive scaling parameter.

In addition, the inverse L-SR1 update is defined by

$$\begin{aligned} D^k:=\vartheta _k I-(\vartheta _k U_k-S_k)(\vartheta _k U_k^T U_k - R_k - R_k^T + C_k)^{-1} (\vartheta _k U_k-S_k)^T. \end{aligned}$$

In the case of a null step, the LDGB uses the L-SR1 update formula, since this formula allows to preserve the boundedness and some other properties of generated matrices which guarantee the global convergence of the method. Otherwise, since these properties are not required after a serious step, the more efficient L-BFGS update is employed. In the LDGB, the individual updates that would violate positive definiteness are skipped (for more details, see [1012, 24]).

Stopping Criterion. For smooth functions, a necessary condition for a local minimum is that the gradient has to be zero, and by continuity, it becomes small when we are close to an optimal point. This is no longer true when we replace the gradient by an arbitrary subgradient or a discrete gradient. Due to the aggregation procedure, we have quite a useful approximation to the gradient at our disposal, namely the aggregate discrete gradient \(\tilde{{\mathbf {v}}}_k\). However, as a stopping criterion, the direct test \(||\tilde{{\mathbf {v}}}_k|| < \delta _k\), for some \(\delta _k > 0\), is too uncertain, if the current piecewise linear approximation of the objective function is too rough. Therefore, we use the term \(\tilde{{\mathbf {v}}}_k^T D^k \tilde{{\mathbf {v}}}_k = - \tilde{{\mathbf {v}}}_k^T {\mathbf {d}}_k\) and the aggregate subgradient locality measure \(\tilde{\beta }_k\) to improve the accuracy of \(||\tilde{{\mathbf {v}}}_k||\). Hence, the stopping parameter \(w_k\) at iteration k is defined by

$$\begin{aligned} w_k := - \tilde{{\mathbf {v}}}_k^T {\mathbf {d}}_k + 2 \tilde{\beta }_k. \end{aligned}$$

The inner iteration stops if \(w_k \le \delta _k\) and the outer iteration—and, thus, the algorithm—stops if \(\delta _k \le \varepsilon \) for some user-specified \(\varepsilon > 0\). The parameter \(w_k\) is also used during the line search procedure to represent the desirable amount of descent.

Global Convergence. If the LDGB algorithm terminates after a finite number of iterations, say at iteration k, then the point \({\mathbf {x}}_k\) is a stationary point of f. Otherwise, the accumulation point \(\bar{{\mathbf {x}}}\) generated by LDGB algorithm is a stationary point of f [10].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karmitsa, N. Testing Different Nonsmooth Formulations of the Lennard–Jones Potential in Atomic Clustering Problems. J Optim Theory Appl 171, 316–335 (2016). https://doi.org/10.1007/s10957-016-0955-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-016-0955-5

Keywords

Mathematics Subject Classification

Navigation