Keywords

1 Introduction

Discrete optimization often occurs in AI and smart solution development. It interacts with AI development whenever there is a need to optimize on discrete sets. Examples of smart solution development include vehicle routing problem and its variances [1, 2], bin packing problem and its variances [3, 4], network planning [5], knapsack problem [6], assignment problem [7], transportation problem [8], and many other problems which are becoming increasingly relevant in the context of smart cities.

In the last decades there has been a huge advancement in hardware technology. This has allowed for massively parallel systems. The algorithm presented in this work is motivated by exploiting this fact in order to improve the existing optimization techniques. A massive parallelization allows exploration of a much larger search space, which is done by branching the search into promising search directions and then investigating each of them concurrently.

Variable neighborhood descent (VND) is a meta-heuristic method described by Duarte, Sanchez-Oro, Mladenovic and Todosijevic in [9] for solving combinatorial optimization problems. The algorithm is shown in Algorithm 1. It explores the neighborhoods of the incumbent solution and iterates as long as it can find an improving solution. Namely, it keeps a list of neighborhood generation strategies (ways of modifying a candidate solution) and applies them to the current solution one at a time. If an improving solution is found among neighbors generated from a certain strategy, then the current solution is set to the improving solution and the next iteration is started from the first strategy. If a neighborhood generation strategy cannot produce an improving solution, the next strategy is examined. If no strategy can produce an improving solution, the algorithm stops and the current solution is returned as the search result. Notice that this ensures that the solution returned is a local optimum with respect to the defined neighborhood generation strategies.

figure a

VND is often used as a local search procedure for the VNS (Variable neighborhood search) algorithm [10]. Also, it has been heavily used in various applications such as routing problems [1, 2], packing problems [3, 4], scheduling problems [11, 12] and many others.

We point out that in VND, only one improving solution is being selected. Although [9] specifies that the best improving solution in the neighborhood is selected, this is not necessary and can be changed to a different strategy. One can select the first improving, best improving, random improving or some other improving solution. If there are multiple improving solutions, only one of them will be selected. This can result in a convergence towards a local optimum instead of a global optimum. In order to alleviate this effect, we propose branching into a population of solutions. We call this variation of the VND algorithm the Population-based variable neighborhood descent and describe it in the following sections.

The rest of the paper is organized as follows. The proposed algorithm is described in Sect. 2. Section 3 describes the experimental results achieved using the proposed algorithm in relation to the base algorithm. It also discusses the time complexity and gives usage advice for the proposed algorithm. Conclusions are given in Sect. 4.

2 Proposed Algorithm

By introducing a population of current solutions, we are expanding the search, making it less likely to get stuck in local optima. The population is limited to M units to prevent an explosion of solutions whose neighborhoods are searched. The proposed algorithm is shown in Algorithm 2. Improving neighbors are those better than the best solution in the current population. The next population is formed by selecting at most M units from the set of improving neighbors of all current solutions. If there are none, the next neighborhood generation strategy is examined. When the algorithm cannot find any improving solution for any strategy, the best solution in the last population is returned as the result.

Although the members of the population seem independent at first they share information through the fact that their neighbors can only become candidates for the next population if they are better then the best solution in the current population.

If the maximum population size M is set to 1, the algorithm becomes the VND algorithm. Therefore, the proposed algorithm can be seen as a generalization of VND.

figure b

Selecting the next population among the improving neighbors of all current solutions can be done in various ways. There are three main selection strategies:

  • Randomly select M among all improving neighbors.

  • Select the best M among all improving neighbors. — This selection strategy leads to a quicker convergence; however, the result might be only a local optimum.

  • Select the first M improving neighbors. — This selection strategy speeds up the iteration, but can result in inferior quality or a higher number of iterations.

3 Evaluation

We performed experiments comparing VND and our algorithm on capacity vehicle routing problem (CVRP), using several similar search strategies. The aim of the experiment was not to produce the best possible results on the given dataset, but to show that a population-based variable neighborhood descent will on average outperform the VND algorithm in solution quality. Therefore, we do not present the absolute achieved results, but only the relative improvement (reduction) in the quality of the obtained solution. The evaluation code is available atFootnote 1.

Section 3.1 presents the evaluation results on CVRP instances. Section 3.2 provides complexity analysis of the proposed approach compared to the base VND approach. Finally, Sect. 3.3 provides useful tips and advices while using the approach.

3.1 Capacity Vehicle Routing Problem

Capacity vehicle routing problem is a well known problem in combinatorial optimization.

The capacity vehicle routing problem is defined as follows:

  • There are N customers.

  • Customer nodes are indexed from \(i \in \{1,...,N\}\).

  • The commodity should be delivered using K independent delivery vehicles of identical capacity C.

  • Each customer \(i \in \{1,...,N\}\) requires a quantity \(q_i\) of a commodity being delivered. \(q_i \in \{1,...,C\} \forall i \in \{1,...,N\}\).

  • All deliveries start from node 0 which marks the central depot.

  • For each transit from node i to node j there is an associated cost \(c_{ij}\). The cost structure is assumed to be symmetric \(c_{ij}=c_{ji}, \forall i,j \in \{0,1,...,N\}\). And the cost of transit from a node to itself is always 0, \(c_{ii}=0, \forall i \in \{0,1,...,N\}\)

  • If a devicer vehicle \(k \in {1,...,K}\) travels from node i to node j then \(x_{kij}=1\) otherwise \(x_{kij}=0\).

  • Each customer node can only be visited by a single delivery vehicle. It is not possible to split a delivery among multiple delivery vehicles.

  • The aim is to minimize the total travel distance of all delivery trucks.

This can formally be defined as follows:

$$\begin{aligned} \text {minimize} \,&\sum _{i,j \in \{0,...,N\},\, k \in \{1,...,K\}} x_{i,j,k} \cdot c_{ij}\quad \text {subject to:} \end{aligned}$$
(1)
$$\begin{aligned}&\sum _{j \in \{0,...,N\},\,k \in \{1,...,K\}} x_{i,j,k} = 1,\quad \forall i \in \{1,...,N\} \end{aligned}$$
(2)
$$\begin{aligned}&\sum _{i \in \{0,...,N\},\,k \in \{1,...,K\}} x_{i,j,k} = 1,\quad \forall j \in \{1,...,N\}\end{aligned}$$
(3)
$$\begin{aligned}&\sum _{j \in \{1,...,N\},\,k \in \{1,...,K\}} x_{0,j,k} = k\end{aligned}$$
(4)
$$\begin{aligned}&\sum _{i \in \{1,...,N\},\,k \in \{1,...,K\}} x_{i,0,k} = k\end{aligned}$$
(5)
$$\begin{aligned}&\sum _{i \in \{1,...,N\},\,j \in \{0,...,N\}} x_{i,j,k} \cdot q_{i} \le C,\quad \forall k \in \{1,...,K\} \end{aligned}$$
(6)

We used the Augerat 1995 dataset from the CVRPLIB dataset [13]. We defined three different neighborhood generation strategies: one merges two routes into one, one swaps stations between two routes, and one moves the station from one route to another. For each change, the route is re-optimized using a simple TSP solver. In the starting solution, the number of routes is equal to the number of stations and each route contains a single station.

We solved the dataset using VND with three different selection strategies. One strategy selects the best improving neighbor, one selects the first improving neighbor, and one selects a random improving neighbor. The dataset was solved 100 times per strategy. The average solution length for all instances of VND is:

  • 679 for RANDOM strategy,

  • 686 for FIRST strategy and

  • 687 for BEST strategy.

This information is provided in the hopes it will help readers get a better feeling for the performance improvement of the proposed algorithm.

Fig. 1.
figure 1

Route length reduction for RANDOM selection strategy

We solved the same dataset with our proposed algorithm using the same neighborhood generation strategies, with and without strategy shuffling (permuting the list of neighborhood generation strategies from iteration to iteration), using the same starting solution, and using three similar selection strategies: one selects M best improving solutions, where M is the maximum population size; one selects M first improving solutions, and one selects M random improving solutions. We use population sizes ranging from \(M=5\) to \(M=15\). The dataset was solved 100 times per strategy per population size. Parallelization was not used in Population based VND in these experiments. The achieved results by similar strategies are presented in Figs. 1, 2 and 3. These figures help visualize the relationship between the population size and the route length reduction, as well as the effect of strategy shuffling. We filtered out outlier results which deviated more then one standard deviation from the average result per strategy per population size. The results show that, on average, shorter routes were produced by our proposed algorithm: length reduction is a relative difference of the proposed solution from the VND solution.

Figure 4 shows the relationship between the population size and execution time of the algorithm for different strategies. It is important to note that the figure shows an execution time comparison between a Population-based VND and a VND using the same strategy. The execution time is presented as a time increase factor. For example, factor of 1 corresponds to the time which is equal to the execution time of the VND algorithm. Factor of 2 corresponds to the time which is twice the time of the original algorithm etc. From the presented data we can see a linear correlation between the time increase and the population size.

Fig. 2.
figure 2

Route length reduction for FIRST selection strategy

From the presented results it can be concluded that the proposed version of the algorithm produces better results. This is a conclusion which is generalizable. Since the proposed algorithm allows for a more thorough solution space search, in non deceptive problems it has a higher chance of avoiding local optimums and reaching the global optimum. Also, conclusions can be drawn about the presented strategies:

  • RANDOM improving strategy makes a very robust search of the solution space. By allowing a population of solutions to be searched more paths are being investigated. This is a good strategy to use if a problem has a lot of local optimums or is deceptive in nature.

  • FIRST improving strategy is the best candidate when execution time is of critical importance. By introducing a population of solutions the chance of getting stuck in a local optimum is reduced.

  • BEST improving strategy is the best candidate when we suspect the global optimum will be either a greedy solution or very similar to it.

It can be seen that for this CVRP problem the RANDOM improving strategy, on average, generates the best solutions. This is an expected result, since VRP problems in general are known to have local optimums.

Fig. 3.
figure 3

Route length reduction for BEST selection strategy

Fig. 4.
figure 4

Time increase

3.2 Time Complexity Analysis

In this section we examine how the time performance has changed with respect to VND and discuss three sections of the VND algorithm affected by the proposed changes.

Finding the Best Unit in the Population. Determining the best solution in the current population is a step that does not exist in the base VND algorithm. In this step, the best out of the current population of solutions is identified. Since the current population of solutions can have at most M solutions, the worst case complexity of this step is O(M).

Item Neighborhood Generation. In the base version of VND, one neighborhood per iteration is generated. Here we are generating between 1 and M neighborhoods per iteration. Under the assumption that neighborhood generation generates approximately the same size neighborhood for each solution given the fact that we are generating upto M neighborhoods if both algorithms are run sequentially (without parallelization), our algorithm will be M times slower if the population size is maximal in each iteration, which is the worst case. If neighborhood generation can generate neighborhoods of significantly different sizes for different solutions then it is possible for our algorithm to be more then M times slower if it is generating M neighborhoods in each iteration. If the neighborhoods are generated concurrently, where a separate thread generates the neighborhood of each solution in the population, this factor is reduced. The complexity of generating a single neighborhood depends on the problem at hand and the generation strategy.

Selecting the Next Population. In this step only selecting the next population out of candidate neighbours is considered. If less then M improving neighbors are found then regardless of strategy all improving are set as the next population and this has time complexity O(1). If more then M improving neighbors are found then the time required to select the next population (M units out of all improving neighbors) depends on the selection strategy, the amount of available improving neighbors, and whether parallelization is utilized. If first M improving neighbors are immediately received from the threads and selected, the effect of this step is O(1). On the other hand, if best M or random M improving neighbors are selected, receiving (at most) M neighbors from each of (at most) M neighborhoods, assuming that each neighborhood is generated in parallel and keeps track of its improving solutions, gives a dominating time complexity of \(O(M^2)\) per iteration since M improving solutions need to be selected among \(M^2\) possible candidates.

3.3 Usage Advice

In order to get the most out of the proposed algorithm, we give some guidelines to improve quality and/or time performance of the implementation. We divide the guidelines into suggestions for quality and suggestions for time performance, noting that sometimes there is a trade-off between these two properties.

Suggestions for Quality. Find a starting solution as good as possible. A better starting solution should reduce the average population size per iteration and the number of algorithm iterations. This will reduce the overall execution time of the algorithm. The proposed algorithm is best used for the intensification phase of the optimization procedure, after a reasonable starting solution has already been found.

Permute the strategies between iterations. Permuting the list of neighborhood generation strategies from iteration to iteration adds to the randomness of the search procedure and can often help produce better solutions. This claim is supported by our experimental results which are presented in Figs. 1, 2 and 3.

Select a larger population when possible. Selecting a larger population will add to the execution time of the algorithm, but will make the search procedure more exhaustive and thus increase the chance of producing better solutions. This claim is supported by our experimental results which are presented in Figs. 1, 2 and 3.

Suggestions for Time Performance. Sort the strategies in ascending order by size. If the neighborhood generation strategies are not being permuted between iterations (see Subsect. 3.3), sorting them in ascending order by the number of generated neighbors could help reduce the overall execution time, because then the faster strategies are used more often. This, however, depends on the assumption that different strategies have a similar chance of producing an improving neighbor, which is sometimes not the case.

Parallelize the neighborhood generation step. If the resources are available, the neighborhood of each solution in the population can be generated concurrently, in a separate thread, since the generated neighborhoods are mutually independent. This will often be a very significant reduction in the overall execution time. Each thread can send every improving neighbor (as soon as found) to the main thread. Using a first M selection strategy, when a total of M improving neighbors are received by the main thread, the neighborhood generation can be aborted. Note that, when neighbors of a single solution are examined, if more than M improving solutions are found, only M of them can be returned to the main procedure because at most M will ultimately be selected. This implies that a list of improving solutions can be pre-filtered in a neighborhood-generating thread, depending on the selection strategy.

4 Conclusion

We have presented a new variation of the VND algorithm for discrete optimization, applicable to many combinatorial problems in smart solution development such as capacity vehicle routing problem. By introducing a population of current solutions, we have made the algorithm more exhaustive and thus able to produce better solutions. The algorithm should be used for the intensification phase of solving the optimization problem. The increase in execution time is alleviated by parallelization and justified by the production of better search results. For the capacity vehicle routing problem, we have showed the superiority of the proposed algorithm over VND. Future work in this area will explore ways of determining improving solutions, hybrid algorithms, and other applications in the context of intelligent services.