Keywords

1 Introduction

Function optimization plays an important role in a number of applications, ranging from simulation in aerodynamics to fine-tuning machine learning algorithms. Although one can find several problems that can be modeled by convex functions, most applications out there pose a bigger challenge, since they are usually encoded by non-convex functions, which means they may contain both local and global optima. In light of that, one may handle such sort of functions by means of two approaches: (i) the first one concerns with trying different optimization techniques, and (ii) the second one aims at finding a different representation of the search space in order to deal with smoother landscape functions.

In the last decades, the scientific community has focused even more on optimization techniques based on the nature, the so-called meta-heuristics [23]. Such techniques are based on different mechanisms that address the problem of optimization, such as evolutionary processes, mimetism, and swarm intelligence, just to name a few. These techniques have been in the spotlight mainly due to their elegance and easiness of implementation, as well as solid results in a number of well-known problems in the literature [1].

Recently, Malan and Engelbrecht [11] presented an interesting study that aimed at predicting possible situations in which the well-known Particle Swarm Optimization (PSO) algorithm would fail based on the fitness landscape. Basically, depending on the “smoothness degree” of the fitness function landscape, one can expect a probable performance of the algorithm. Once again, the fitness function plays an important role in the optimization problem, and finding suitable representations of that function in different search spaces is of great importance to obtain more accurate results. Years before their work, Merz and Freisleben [12] presented a study about fitness landscape analysis concerning memetic algorithms, and Humeau et al. [9] conducted a similar study, but in the context of local search algorithms.

Some years ago, Fister et al. [4] presented a modified version of the Firefly Algorithm based on quaternions, and later on Fister et al. [3] proposed a similar approach to the Bat Algorithm. Roughly speaking, the quaternion algebra extends the complex numbers by representing a real number using four variables [6], being widely used in areas when one needs to perform rotations with minimal computation, such as spacecraft controllers, for instance. Usually, standard representations of meta-heuristic techniques, i.e., the ones based on Euclidean spaces, tend to get trapped from local optima in higher dimensional spaces, since the fitness landscapes may become even more complicated. Therefore, that is the main motivation in using quaternion-based algebra.

One of the main drawbacks related to swarm-driven meta-heuristics concerns with their computational burden, since the fitness function needs to be evaluate whenever a possible solution changes its position. Since all decisions of a movement are taken into account collectively, a single movement of a possible solution may affect all remaining ones, thus requiring their new positioning. Harmony Search (HS) is a technique that updates one solution at each iteration only, making it one of the fastest meta-heuristic optimization techniques [5]. The idea is to model each decision variable as an instrument, being the best combination of them the solution to the chosen. By best combination we mean the one which provides “the music with optimal harmony”.

One can refer to a number of different variants of HS, such as Improved Harmony Search [10], Global-best Harmony Search [13], and Self-adaptive Global-best Harmony Search [14], just to name a few. However, to the best of our knowledge, there is no Harmony Search implementation based on quaternions up to date. Therefore, the main contribution of this paper is to propose a quaternion-oriented Harmony Search, hereinafter called Quaternion Harmony Search (QHS). The proposed approach is compared against PSO, HS and Improved Harmony Search (IHS) for the task of benchmark function optimization. Also, we validate the proposed approach to fine-tune parameters of Restricted Boltzmann Machines to the task of binary image reconstruction, which turns out to be another contribution of this work.

The remainder of the paper ir organized as follows. Sections 2 and 3 present the theoretical background about quaternions and Harmony Search, respectively. Sections 4 and 5 present the methodology and experiments, respectively. Finally, Sect. 6 states conclusions and future works.

2 Quaternion Algebra

A quaternion q is composed of real and complex numbers, i.e., \(q=x_0+x_1i+x_2j+x_3k\), where \(x_0,x_1,x_2,x_3\in \mathfrak {R}\) and ijk are imaginary numbers following the next set of equations:

$$\begin{aligned} ij=k, \end{aligned}$$
(1)
$$\begin{aligned} jk=i, \end{aligned}$$
(2)
$$\begin{aligned} ki=j, \end{aligned}$$
(3)
$$\begin{aligned} ji=-k, \end{aligned}$$
(4)
$$\begin{aligned} kj=-i, \end{aligned}$$
(5)
$$\begin{aligned} ik=-j, \end{aligned}$$
(6)

and

$$\begin{aligned} i^2=j^2=k^2=-1. \end{aligned}$$
(7)

Roughly speaking, a quaternion q is represented in a 4-dimensional space over the real numbers, i.e., \(\mathfrak {R}^4\). Actually, we can consider the real numbers only, since most applications do not consider the imaginary part, as the one addressed in this work.

Given two quaternions \(q_1=x_0+x_1i+x_2j+x_3k\) and \(q_2=y_0+y_1i+y_2j+y_3k\), the quaternion algebra defines a set of main operations [2]. The addition, for instance, can be defined by:

$$\begin{aligned} q_1+q_2&=(x_0+x_1i+x_2j+x_3k)+(y_0+y_1i+y_2j+y_3k)\\ \nonumber&=(x_0+y_0)+(x_1+y_1)i+(x_2+y_2)j+(x_3+y_3)k, \end{aligned}$$
(8)

while the subtraction is defined as follows:

$$\begin{aligned} q_1-q_2&=(x_0+x_1i+x_2j+x_3k)-(y_0+y_1i+y_2j+y_3k)\\ \nonumber&=(x_0-y_0)+(x_1-y_1)i+(x_2-y_2)j+(x_3-y_3)k. \end{aligned}$$
(9)

Another important operation is the norm, which maps a given quaternion to a real-valued number, as follows:

$$\begin{aligned} N(q_1)&=N(x_0+x_1i+x_2j+x_3k)\\ \nonumber&=\sqrt{x_0^2+x_1^2+x_2^2+x_3^2}. \end{aligned}$$
(10)

Finally, Fister et al. [3, 4] introduced two other operations, qrand and qzero. The former initializes a given quaternion with values drawn from a Gaussian distribution, and it can be defined as follows:

$$\begin{aligned} qrand()=\{x_i=\mathcal{N}(0,1)|i\in \{0,1,2,3\}\}. \end{aligned}$$
(11)

The latter function initialized a quaternion with zero values, as follows:

$$\begin{aligned} qzero()=\{x_i=0|i\in \{0,1,2,3\}\}. \end{aligned}$$
(12)

Although there are other operations, we defined only the ones employed in this work.

3 Harmony Search

Harmony Search is a meta-heuristic algorithm inspired in the improvisation process of music players. Musicians often improvise the pitches of their instruments searching for a perfect state of harmony [5]. The main idea is to use the same process adopted by musicians to create new songs to obtain a near-optimal solution according to some fitness function. Each possible solution is modelled as a harmony, and each musician corresponds to one decision variable.

Let \(\mathcal{\phi }=(\varvec{\phi }_1,\varvec{\phi }_2,\ldots ,\varvec{\phi }_N)\) be a set of harmonies that compose the so-called “Harmony Memory”, such that \(\varvec{\phi }_i\in \mathfrak {R}^M\). The HS algorithm generates after each iteration a new harmony vector \(\varvec{\hat{\phi }}\) based on memory considerations, pitch adjustments, and randomization (music improvisation). Further, the new harmony vector \(\varvec{\hat{\phi }}\) is evaluated in order to be accepted in the harmony memory: if \(\varvec{\hat{\phi }}\) is better than the worst harmony, the latter is then replaced by the new harmony. Roughly speaking, HS algorithm basically rules the process of creating and evaluating new harmonies until some convergence criterion is met.

In regard to the memory consideration step, the idea is to model the process of creating songs, in which the musician can use his/her memories of good musical notes to create a new song. This process is modelled by the Harmony Memory Considering Rate (HMCR) parameter, which is the probability of choosing one value from the historic values stored in the harmony memory, being \((1-HMCR)\) the probability of randomly choosing one feasible valueFootnote 1, as follows:

$$\begin{aligned} \hat{\phi }^j= & {} \left\{ \begin{array}{ll} \phi ^j_A &{} \text { with probability HMCR} \\ \theta \in \varPhi _j &{} \text { with probability (1-HMCR),} \end{array}\right. \end{aligned}$$
(13)

where \(A\sim \mathcal{U}(1,2,\ldots ,N)\), and \(\varvec{\varPhi }=\{\varPhi _1,\varPhi _2,\ldots ,\varPhi _M\}\) stands for the set of feasible values for each decision variableFootnote 2.

Further, every component j of the new harmony vector \(\varvec{\hat{\phi }}\) is examined to determine whether it should be pitch-adjusted or not, which is controlled by the Pitch Adjusting Rate (PAR) variable, according to Eq. 14:

$$\begin{aligned} \hat{\phi }^j= & {} \left\{ \begin{array}{ll} \hat{\phi }^j\,\pm \, \varphi _j \varrho &{} \text { with probability PAR} \\ \hat{\phi }^j &{} \text { with probability (1-PAR).} \end{array}\right. \end{aligned}$$
(14)

The pitch adjustment is often used to improve solutions and to escape from local optima. This mechanism concerns shifting the neighbouring values of some decision variable in the harmony, where \(\varrho \) is an arbitrary distance bandwidth, and \(\varphi _j\sim \mathcal{U}(0,1)\). In the following, we briefly present the Improved Harmony Search technique, which has been used in the experimental section.

3.1 Improved Harmony Search

The Improved Harmony Search (IHS) [10] differs from traditional HS by updating the PAR and \(\varrho \) values dynamically. The PAR updating formulation at time step t is given by:

$$\begin{aligned} PAR^t = PAR_{min}+\frac{PAR_{max}-PAR_{min}}{T}t, \end{aligned}$$
(15)

where T stands for the number of iterations, and \(PAR_{min}\) and \(PAR_{max}\) denote the minimum and maximum PAR values, respectively. In regard to the bandwidth value at time step t, it is computed as follows:

$$\begin{aligned} \varrho ^t=\varrho _{max}\exp {\frac{\ln (\varrho _{min}/\varrho _{max})}{T}t}, \end{aligned}$$
(16)

where \(\varrho _{min}\) and \(\varrho _{max}\) stand for the minimum and maximum values of \(\varrho \), respectively.

3.2 Quaternion Harmony Search

The proposed approach aims at mapping the problem of optimizing variables in the Euclidean space to the quaternions space. As aforementioned, the idea is to obtain smoother representations of the fitness landscape, thus making the problem easier to handle.

In the standard Harmony Search, each harmony \(\varvec{\phi }_i\in \mathfrak {R}^M\), \(i=1,2,\ldots ,N\) is modeled as an array containing M variables to be optimized, such that \(\phi _{ij}\in \mathfrak {R}\). In QHS, each decision variable j is now represented by a quaternion \(q_j\in \mathfrak {R}^4\), such that each harmony can be seen as a matrix \(\varvec{\phi }'_i\in \mathfrak {R}^{4\times N}\). Therefore, each harmony is no longer an array of decision variables, but a matrix instead.

However, we can map each quaternion to a real-valued number in order to use standard HS. Basically, one has to compute \(\phi _{ij}=N(\phi '_{ij})\), \(i=1,2\ldots ,N\) and \(j=1,2,\ldots ,M\). Further, the standard HS procedure can be executed as usual. But note the optimization process is conducted at quaternion level, which means QHS aims finding the quaternions for each decision variable such that their norm values minimizes the fitness function. An ordinary HS aims at learning values for each decision variable that minimizes the fitness function.

4 Methodology

In this section, we present the methodology employed in this work to validate the proposed approach. The next sections describe the techniques used for comparison purposes, benchmark functions and the statistical analysis.

4.1 Functions

The benchmark functions used in the experiments are the following:

  • Sphere Function:

    \(f(x,y) = x^2 + y^2\)

    The minimum is \(f(0,0) = 0\), and the domain is

    \(-\infty< x,y < \infty \).

  • Ackley’s Function:

    \(f(x,y) = -20e^{(-0.2 \sqrt{0.5 (x^2 + y^2)})} -e^{(0.5(cos(2 \pi x) + cos(2 \pi y) ))} + e + 20\)

    The minimum is \(f(0,0) = 0\), and the domain is \(-5 \le x,y \le 5\).

  • Rosenbrock’s Function:

    \(f(x,y) = 100 \left( y - x^{2}\right) ^{2} + \left( x - 1\right) ^{2}\)

    The minimum is \(f(1,1) = 0\), and the domain is

    \(-\infty< x,y < \infty \).

  • Beale’s Function:

    \( f(x,y) = \left( 1.5 - x + xy \right) ^{2} + \left( 2.25 - x + x y^{2}\right) ^{2} + \left( 2.625 - x+ xy^{3}\right) ^{2}\)

    The minimum is \(f(3,0.5) = 0\), and the domain is \(-4.5< x,y < 4.5\).

  • Matyas Function:

    \(f(x,y) = 0.26 \left( x^{2} + y^{2}\right) - 0.48 xy\)

    The minimum is \(f(0,0) = 0\), and the domain is

    \(-10\le x,y \le 10\).

  • Levi Function N.13:

    \(f(x,y) = \sin ^{2}\left( 3\pi x\right) +\left( x-1\right) ^{2} \left( 1+\sin ^{2}\left( 3\pi y\right) \right) +\left( y-1\right) ^{2} \left( 1+\sin ^{2}\left( 2\pi y\right) \right) \)

    The minimum is \(f(1,1) = 0\), and the domain is

    \(-10< x,y < 10\).

  • Three-Hump Camel Function:

    \(f(x,y) = 2x^{2} - 1.05x^{4} + \frac{x^{6}}{6} + xy + y^{2}\)

    The minimum is \(f(0,0) = 0\), and the domain is

    \(-5 \le x,y \le 5\).

  • Easom Function:

    \(f(x,y) = -\cos \left( x\right) \cos \left( y\right) e^{\left( -\left( \left( x-\pi \right) ^{2} + \left( y-\pi \right) ^{2}\right) \right) }\)

    The minimum is \(f(\pi , \pi ) = -1\), and the domain is

    \(-100< x,y < 100\).

  • S. Tang:

    \(f(x,y) = 0.5\,( (x^4 - 16x^2 + 5x) + (y^4 - 16y^2 + 5y)0\)

    The minimum is \(f\left( -2.9035, -2.9035\right) =-78.3254\), and the domain is \(-5<~x,y < 5\).

  • Schaffer Function N2:

    \(f(x,y) = 0.5 + (\sin ^2(x^2 - y^2) - 0.5)/(1 + 0.001*(x^x + y^y)^2)\)

    The minimum is \(f\left( 0, 0\right) = 0\), and the domain is \(-100< x,y < 100\).

4.2 Meta-Heuristic Techniques

We compare the efficiency and effectiveness of QHS against three other methods, say that: naïve HS, IHS and PSO. For each optimization method and function, we conducted a cross-validation with 20 runs. Table 1 displays the parameters setup, being their values empirically chosen [18]. Notice we employed 20 agents and 15,000 iterations for all meta-heuristic techniques, and both x and y decision variables of the aforementioned functions were initialized within the range \([-10,10]\).

Table 1. Parameter values used for all optimization techniques.
Table 2. Mean values obtained through cross-validation.

5 Experimental Evaluation

In this section, we present the experimental results considering the aforementioned optimization techniques in the context of benchmark functions (Sect. 5.1), as well as to fine-tune Restricted Boltzmann Machines (Sect. 5.2).

5.1 Benchmark Functions

As aforementioned in Sect. 4.1, we considered 10 benchmark functions to evaluate the robustness of QHS. Table 2 displays the mean values obtained by all techniques. Since we are dealing with minimization problems, the smaller the values, the better the techniques. The values in bold stand for the best techniques according to the Wilcoxon signed-rank test [22] with significance level of 0.05.

Clearly, we can observe QHS obtained the best results in 9 out 10 functions, and it has been the sole technique in 4 functions that achieved the best values. In fact, if one take into account the convergence of the techniques, one can realize HS took longer to converge, even on a convex function like Sphere, for instance. Figure 1a depicts the convergence process with respect to Sphere function. We constrained the convergence up to 600 iterations to have a better look at the beginning of the process. One can observe QHS converges at the very first iterations, while HS and IHS need a few hundred of them.

Figure 1b displays the convergence study over Rosenbrock’s function. Almost all techniques, except PSO, achieved the global minimum, being HS the fastest to converge, followed by QHS and IHS. Finally, in regard to Beale’s function (Fig. 1c), QHS took longer than all techniques to converge, but it obtained the best results so far. In fact, HS-based techniques usually suffer from slow convergence when compared to swarm-based ones, since Harmony Search produces only one new solution at each iteration. We can also observe the convergence behaviour of QHS considering Rosenbrock’s and Beale’s functions, which follows a “stair pattern”, which means QHS can get trapped from local optima (e.g., plateaus in the convergence curve), but also has the ability to escape from them.

Fig. 1.
figure 1

Convergence analysis considering (a) Sphere, (b) Rosenbrock’s and (c) Beale’s benchmark functions.

Table 3 presents the mean computational load in seconds considering all techniques and functions. Clearly, QHS is the slowest one, since we need an extra loop to access the content of each decision variable (each variable is now represented as a 4-dimensional vector). In fact, for some datasets, QHS can be faster if we employ a different convergence criterion, since we fixed the number of iterations for all techniques in this work.

Table 3. Mean computational load in seconds.

5.2 Fine-Tuning Restricted Boltzmann Machines

Restricted Boltzmann Machines (RBMs) are stochastic neural networks that aim at learning input representation by means of latent variables placed in a hidden layer [19]. They have become extremely used in the last years due to the “deep learning phenomenon”, where we can stack RBMs on top of each other and obtain the so-called Deep Belief Networks.

Although RBMs have been extensively used in the last years for so many tasks, such as image classification and speech recognition, they are parameter-dependent, which may affect the final results. As far as we are concerned, only a few works have considered optimizing RBMs by means of meta-heuristic techniques [1517]. In addition, we have not observed any work that attempted at using quaternion-based optimization in the context of RBMs fine-tuning.

In this section, we considered optimizing four parameters concerning the task of binary image reconstruction: \(n\in [5,100]\), \(\eta \in [0.1,0.9]\), \(\lambda \in [0.1,0.9]\) and \(\varphi \in [0.00001,0.01]\). In this case, n stands for the number of hidden neurons, \(\eta \) is the learning rate, \(\lambda \) denotes the weight decay parameter, and \(\varphi \) stands for the momentum [8]. The learning rate is used to control the convergence, and \(\lambda \) and \(\varphi \) are used to avoid oscillations and to keep the values of weights small enough for the learning process.

Therefore, we have a four-dimensional search space with three real-valued variables, as well as the integer-valued number of hidden units. Roughly speaking, the proposed approach aims at selecting the set of RBM parameters that minimizes the minimum squared error (MSE) of the reconstructed images from the training set. After that, the selected set of parameters is thus applied to reconstruct the images of the test set. We employed the well-known MNIST datasetFootnote 3, which is composed of images of handwritten digits. The original version contains a training set with 60, 000 images from digits ‘0’–‘9’, as well as a test set with 10, 000 images. Due to the computational burden for RBM model selection, we decided to employ the original test set together with a reduced version of the training set (we used 2 % of the original training set).

We conducted a cross-validation with 20 runnings, 10 iterations for the learning procedure of each RBM, and mini-batches of size 20. In addition, we also considered three learning algorithms: Contrastive Divergence (CD) [7], Persistent Contrastive Divergence (PCD) [20] and Fast Persistent Contrastive Divergence (FPCD) [21]. Finally, the Wilcoxon signed-rank test [22] with significance of 0.05 was used for statistical validation purposes. Table 4 presents the results concerning RBM fine-tuning, where PSO, HS, IHS and a random search (RS) are compared against QHS.

Table 4. Average MSE values considering MNIST dataset.

One can observe QHS obtained the best results using CD and PCD as the training algorithms, meanwhile IHS obtained the lowest errors using FPCD. As a matter of fact, FPCD is used to control possible deviations of the probability distribution of the input data, and it may require more iterations for convergence. Probably, QHS would require more iterations than IHS for some applications, as displayed in Fig. 1. However, the best results were obtained by means of CD and PCD using the proposed QHS technique.

6 Conclusions

In this paper, we proposed a Harmony Search approach based on quaternions, being inspired by previous works that proposed modifications of the well-known Firefly Algorithm and Bat Algorithm in the same context. The Quaternion Harmony Search was compared against naïve Harmony Search, Improved Harmony Search and Particle Swarm Optimization in 10 benchmark functions, and it has demonstrated to be able to escape from local optima as well as to convergence faster (for some datasets). However, QHS pays the price of a higher computational burden, since it maps each decision variable onto a 4-dimensional space. Therefore, given an optimization problem with N variables, each possible solution can be seen as a tensor of dimensions \(4\times N\) in this space. Also, we demonstrated the validity of quaternion-based optimization in the context of RBM parameter fine-tuning.

In regard to future works, we intend to implement other variants of the Harmony Search using quaternions, as well as other swarm-based meta-heuristics, such as Particle Swarm Optimization and Cuckoo Search. Additionally, we aim at studying the behavior of QHS in high dimensional spaces, and also parallel implementations of quaternion-based optimization techniques in both CPU and GPU devices.