Keywords

1 Introduction

Multi-objective combinatorial optimization problems (MOCOPs), such as traveling salesman problem, vehicle routing planning, participant selection problem of crowdsensing and so on, is to find the optimal resource assignment that takes multiple objectives into consideration [1, 2]. To address the issues, the classical linear programming, meta-heuristic search, evolutionary algorithm, and some other population-based intelligent optimization algorithms are introduced. However, with the increasing of the decision variables, the exploration ability of above-mentioned algorithms become limited due to the low efficiency for generating diverse offspring by a stochastic mechanism. Many researchers [3,4,5,6,7] employed the handcrafted strategies that were especially designed according to the characteristics of specific problems to improve the performance. However, such problem-specific methods depends a lot on the artificial experience. The recent advances [8,9,10,11] in machine learning algorithms have shown their strong ability of helping engineers to design optimization algorithms with learning ability to solve different problems with a relatively good performance. Based on this, various studies [12,13,14] have tried to utilize the neurol networks to learn the useful information about the fitness landscape or the distribution of the individuals to generate the more rational offspring. But a large number of the training data must be provided for building the learning models, which is always difficult or expensive to be achieved in practice.

The Generative Adversarial Networks (GAN) proposed by Goodfellow in 2014 is able to learn high-dimensional distributions efficiently with the limited training data by the following adversarial learning mechanism [15], which consists of a generator and a discriminator [16]. The former learns the distribution \( P_{data} (x) \) of a real data x and generates a new sample \( G(z) \) with the prior distribution \( P_{prior} (x) \), while the latter works hard on identifying whether the sample is real or fake, and outputs a discriminant probability \( D(x) \). More specifically, the generator tries to produce the samples as real as possible, with the purpose of decreasing the accuracy of a discriminator. By contrast, the discriminator makes an effort on enhancing its recognition ability. Both of them are trained in a minimax game manner as follows.

$$ \mathop {\hbox{min} }\limits_{G} \mathop {\hbox{max} }\limits_{D} V(D,G) = E_{{x \in P_{data} }} [\log D(x)] + E_{{z \in P_{prior} }} [\log (1 - D(G(z))] $$
(1)

Various studies have been done on GAN-based optimization algorithms, in which the offspring is generated along the direction learned from the distribution of the better candidate solutions. Tan et al. [17] proposed a generative adversarial optimization framework for continuous optimization problems with a single objective. A new solution is produced from a candidate with the input noise by the generator, and then the discriminator predicts whether the fitness value of a generated solution is better than that of the original candidate. He et al. [15] first presented a GAN-based multi-objective evolutionary algorithm. The candidates are classified into two datasets that are labeled as real and fake, respectively. After training the generator with the above samples, the offspring is formed by it or the genetic operators with the same probability. Despite the application of GAN on continuous optimization problems, fewer works have been done on the combinatorial optimization problems. Probst [18] employed GAN to build the probability model that approximates the distribution of the candidates in the estimation of distribution algorithm. The offspring are gotten from the probability model, with the purpose of finding the optima for a scalar combinatorial optimization problem. Being different from it, a multi-objective combinatorial generative adversarial optimization algorithm (MOCGAO) is proposed for the MOCOPs in this paper. And the main contributions of this study are summarized as follows:

  1. (1)

    MOCGAO takes advantage of the learning and generative abilities of GAN to generate superior solutions for the MOCOPs.

  2. (2)

    A classification strategy is developed to identify the non-dominated solutions found in the evolution as the real samples for training GAN, as they provide the distribution information of the population.

  3. (3)

    An offspring reproduction strategy is designed to form a new feasible solution from the decimal output of the generator for the MOCOPs.

  4. (4)

    The proposed method is applied to solve the participant selection problem of the crowdsensing and the detailed offspring reproduction strategy is given.

The rest of the paper is organized as follows. In Sect. 2, the key issues of multi-objective combinatorial generative adversarial optimization are presented in detail. MOCGAO-based participant selection strategy for the crowdsensing is illustrated in Sect. 3. The experimental results are compared and further analyzed in Sect. 4. Finally, Sect. 5 concludes the whole paper and plans the topic to be researched in the future.

2 Multi-objective Combinatorial Generative Adversarial Optimization

According to the algorithm steps of MOCGAO shown in Algorithm 1, the initial population P(0) is constructed by the randomly produced individuals and the optima of each objective obtained by the greedy algorithm. The latter ones provide the extremum for training GAN so as to speed up the convergence of the network. After training GAN by the classified individuals, the generator outputs the offspring Q(t). N individuals selected from the combination of P(t) and Q(t) by the non-domination sort and elite selection strategy of NSGA-II, compose of the population in the next generation P(t + 1). Finally, the Pareto-optimal solution is found until the termination condition is satisfied. Apparently, the key issues of the proposed method are to classify the population, train GAN and reproduce the offspring. The detailed selection strategy of NSGA-II refer to Ref. [19].

figure a

2.1 The Classification of the Population

Both the real and fake samples provide effective information for training GAN. For image processing [20,21,22] and text generation [23], the real data distributions of the images and texts are normally obtained in advance. However, it is difficult to know the fitness landscape, even the true Pareto-optimal solutions before the evolution of actual multi-objective optimization problems. To address the issue, He et al. [15] partitioned the population in each generation into two datasets with the same size, and the one with the better convergence and diversity is treated as real samples. However, both the non-dominate individuals and the dominated ones with the even distribution are classified to form the training data. The latter is conducive to form the GAN that can produce the offspring with better diversity, but the distribution of the worse individuals also is learned by GAN, which slows down the convergence speed.

Different from it, only the non-dominated individuals are labeled as real and employed to form the real dataset \( x^{r} \). The rest of the population are classified into the fake dataset \( x^{f} \). Because GAN iteratively learns only the distribution of better solutions, the generated offspring may more approximate to the true Pareto solutions.

$$ x_{i} \in \left\{ {\begin{array}{*{20}l} {x^{r} } \hfill & {x_{i} \;{\text{is}}\;{\text{a}}\;{\mbox{non-dominated}}\;{\text{solution}}} \hfill \\ {x^{f} } \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right. $$
(2)

2.2 Training GAN

The generator and discriminator of MOCGAO both consist of the feedforward neural network with a single hidden layer, as shown in Fig. 1. Each node in the hidden layer employs the ReLu activation function, while the nodes in the output layer adopt the sigmoid function.

Fig. 1.
figure 1

The structure of GAN

The noise obeying the continuous uniform distribution \( Z \sim U( - 1,1) \) is employed as the input signal to the generator. Following that, the output of \( G(z) \) is transformed into an offspring \( \tilde{x} \) by the offspring reproduction strategy \( Off( \, ) \). The generated sample \( \tilde{x} \) is utilized to calculate the loss function of the generator as follows.

$$ \tilde{V}_{\text{G}} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\log (1 - D({\text{O}}ff(G(z_{i} ))))} $$
(3)

Subsequently, the real samples \( x^{r} \), the fake ones \( x^{f} \) and the generated ones \( \tilde{x} \) are all employed to train the discriminator. The comprehensive loss function is defined.

$$ \tilde{V}_{\text{D}} = \frac{1}{{\left| {x^{r} } \right|}}\sum\nolimits_{i = 1}^{{\left| {x^{r} } \right|}} {\log D(x_{i}^{r} )} + \frac{1}{{\left| {x^{f} } \right|}}\sum\nolimits_{i = 1}^{{\left| {x^{f} } \right|}} {\log (1 - D(x_{i}^{f} ))} + \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\log (1 - D(\tilde{x}_{i} ))} $$
(4)

The parameters of GAN are learned by the Adam algorithm [24], in which each parameter remains the various learning rate, instead of the traditional gradient descent method.

2.3 The Offspring Reproduction Strategy

For most of the combinatorial optimization problems, a solution is normally encoded by the binary or the integer. However, the outputs of the generator in GAN is a decimal between 0 and 1, which cannot represent an individual of a combinatorial optimization problem directly. To this end, a novel offspring reproduction strategy is presented, with the purpose of forming an available individual based on the decimal output of GAN.

The number of neurons in the output layer of a generator is equal to the dimension of decision variables. For a binary-coded individual, the output of each neuron is treated as the probability of each gene being set to 1. In the offspring reproduction strategy, the output of all neurons in the generator are sorted in the descending order, and the variable corresponding to the first unselected one is labeled by 1. Different from it, the possible integer-values of the decision variables are assigned to the genes corresponding to the sorted neurons in the same descending order. That is, the maximum integer-value is first set to the gene corresponding to the neuron having the maximum output. As shown in Fig. 2, the same decimal output of a generator is mapped to the different offspring in terms of the encoding scheme of an individual.

Fig. 2.
figure 2

An example of an offspring reproduction strategy

3 MOCGAO-Based Participant Selection of a Crowdsensing System

To verify the effectiveness of the proposed MOCGAO algorithm, it is applied to the participant selection problem of Crowdsensing, which allocates all the sensing tasks to suitable participants that are selected from the sensing-user set. Assume that the sensing tasks denoted as \( T = \{ t_{1} ,t_{2} , \ldots ,t_{m} \} \) are published on the crowdsensing system, and the mobile users expressed by \( U = \{ u_{1} ,u_{2} , \ldots ,u_{n} \} \) want to participate the works as executors. Suppose that each task need to be fulfilled by \( \xi \) users, and each user is assigned to at most one task. Let \( e_{ij} \) and \( a_{ij} \) be the sensing ability and reward of the user \( u_{j} \) for the task \( t_{i} \), respectively. Moreover, the sensing ability is determined by the distance between the task and the user, and the reward depends on the sharing mechanism of reward for the user and his cooperative friend. More details refer to Reference [25]. Based on this, a participant selection model that maximize both the sensing quality of tasks and the reward of participants can be constructed as follow:

$$ \begin{aligned} & \left\{ {\begin{array}{*{20}c} {\hbox{max} \;f_{1} = \sum\limits_{i} {\sum\limits_{j} {e_{ij} x_{ij} } } } \\ {\hbox{max} \;f_{2} = \sum\limits_{i} {\sum\limits_{j} {a_{ij} x_{ij} } } } \\ \end{array} } \right. \\ & \begin{array}{*{20}c} {s{.}t{.}} & {\xi = \sum\limits_{j} {x_{ij} } ,\forall i \in [1,m]} \\ {} & {\sum\limits_{i} {x_{ij} \le 1} ,\forall i \in [1,n]} \\ \end{array} \\ \end{aligned} $$
(5)

In the above formula, \( [x_{ij} ]_{m \times n} \) is the Task-User matrix, as shown in Fig. 3(a). \( x_{ij} = 1 \) means that the task is allocated to the jth user. The Task-User matrix is converted to the binary vector, as shown in Fig. 3(b), with the purpose of simplifying the training process of GAN.

Fig. 3.
figure 3

An example of an individual (\( m = 2,n = 6,\xi = 2 \))

According to the above-mentioned encoding scheme of the participant selection model, the output of each neuron in the output layer of a generator represents the probability of a user being assigned to a task. In order to obtain the feasible offspring by the proposed offspring reproduction strategy, the number of neurons is set to m*n. For each task, \( \xi \) unallocated users are selected to complete the task in the descending order of the corresponding probability, and the corresponding gene is set to 1.

As shown in Fig. 4, two tasks are allocated to six users in the crowdsensing system, and each task needs two participants. For the task \( t_{1} \), the two users with the maximum probability, \( u_{1} \) and \( u_{6} \), are selected. The first and sixth genes are set to 1. \( u_{4} \) and \( u_{5} \) are chosen to carry out \( t_{2} \). \( u_{6} \) having a higher probability than \( u_{5} \) is not employed by this task because of the constraint for the available users.

Fig. 4.
figure 4

An example of the offspring reproduction (\( m = 2,n = 6,\xi = 2 \))

4 Experimental Results and Discussion

Fully experiments are conducted to examine the performance of the proposed MOCGAO. The main experimental parameters are listed in Table 1, and the other parameters of the participant selection problem refer to paper [25]. Besides, the hypervolume (HV) and coverage(C) metrics [26] are employed to evaluate the performance of the proposed algorithm, and the best results are labeled by bold.

Table 1. The main parameters

To verify the effectiveness of the Adam method, we compare the performances of MOCGAO with the Adam method (Adam) against one with the gradient descent method (Gradient). From the statistical results listed in Table 2, no matter how many users participate to complete the tasks, MOCGAO with the Adam method converges to the best Pareto solutions because the Adam adopts the adaptive learning rate for each parameter instead of the static on in the gradient descent method, which is more efficient for training GAN.

Table 2. Comparison of the performances for MOCGAOs with different learning methods

The rationality of the Greedy-based initialization method (Greedy) is analyzed by comparing the performance of MOCGAO with the random initialization strategy (Random), as listed in Table 3. Apparently, the extreme of each objective provides more information on the true distribution of the Pareto front, which is helpful for speeding up the training process of GAN and generating the more promising offspring. Thus, MOCGAO with the Greedy-based initialization method can found better Pareto-optimal solutions, showing the larger HV-values.

Table 3. Comparison of the performances for MOCGAOs with different initialization strategies

Two representative multi-objective evolutionary algorithms, including NSGA-II [19] and MOEA/D [27] are employed as the comparison algorithms. Figure 5 depicts the Pareto fronts obtained by the three compared algorithms. It can be observed that the Pareto-optimal solutions of all instances generated by NSGA-II have the worst convergence, while MOEA/D tends to generate the Pareto solutions distributed in a small region, but with the better convergence. In contrast, the Pareto fronts found by the proposed MOCGAO achieve the best performance in both convergence and distribution for 11 out of 12 instances, except for the 10*100 instance. The statistical results of HV and C summarized in Tables 4 and 5 confirm the above observations. Consequently, the proposed algorithm outperforms the other compared algorithms due to the better diversity of the offspring maintained by the improved GAN, especially for the problems with the large-scale participants.

Fig. 5.
figure 5

The Pareto fronts found by different algorithms

Table 4. Comparison of HV for different algorithms
Table 5. Comparison of C for different algorithms

5 Conclusions

To overcome the weakness of the traditional evolutionary algorithms on solving multi-objective combinatorial optimization problems with the large-scale decision variables, a generative adversarial network that has the strong learning and generative abilities is introduced to construct a multi-objective combinatorial generative adversarial optimization algorithm. The extreme of each objective obtained by the greedy algorithm is combined with the randomly produced individuals to form the initial population, with the purpose of speeding up the training process of GAN. During the evolution, the optimal non-dominated solutions in the current generation are identified as the real samples, while the rest are fake. Classified solutions are employed to train GAN. More specifically, the Adam method with the adaptive learning rate is employed to update the parameters of GAN, and an offspring reproduction strategy is presented to obtain a feasible offspring from the decimal output of the generator. Following that, the proposed algorithm is utilized to solve the participant selection problem of the crowdsensing, and a detailed offspring reproduction strategy is given. The experiments are conducted on the crowdsensing system with the various tasks and participants, and the results show that the proposed algorithm superior to the others in both convergence and distribution. To combine the evolutionary operators with GAN will be our future work, with the purpose of generating the offspring with more uniform distribution.