In this paper, we introduce DuelGAN, a generative adversarial network (GAN) solution to improve the stability of the generated samples and to mitigate mode collapse. Built upon the Vanilla GAN’s two-player game between the discriminator \(D_1\) and the generator G, we introduce a peer discriminator \(D_2\) to the min-max game. Similar to previous work using two discriminators, the first role of both \(D_1\), \(D_2\) is to distinguish between generated samples and real ones, while the generator tries to generate high-quality samples which are able to fool both discriminators. Different from existing methods, we introduce a duel between \(D_1\) and \(D_2\) to discourage their agreement and therefore increase the level of diversity of the generated samples. This property alleviates the issue of early mode collapse by preventing \(D_1\) and \(D_2\) from converging too fast. We provide theoretical analysis for the equilibrium of the min-max game formed among \(G,D_1,D_2\). We offer convergence behavior of DuelGAN as well as stability of the min-max game. It’s worth mentioning that DuelGAN operates in the unsupervised setting, and the duel between \(D_1\) and \(D_2\) does not need any label supervision. Experiments results on a synthetic dataset and on real-world image datasets (MNIST, Fashion MNIST, CIFAR-10, STL-10, CelebA, VGG) demonstrate that DuelGAN outperforms competitive baseline work in generating diverse and high-quality samples, while only introduces negligible computation cost. Our code is publicly available at https://github.com/UCSC-REAL/DuelGAN.
J. Wei and M. Liu—Equal contributions.
JHW and YL are partially supported by the National Science Foundation (NSF) under grants IIS-2007951, IIS-2143895, and CCF-2023495. MHL, JHL, and JD are supported in part by WISEautomotive through a ATC+ Program award from the Korean Ministry of Trade, Industry and Energy (MOTIE).
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
The appendix is organized as follows:
Sect. A gives the detailed algorithm of DuelGAN.
Sect. B includes the omitted proofs for all theoretical conclusions in the main paper.
Sect. C includes experiment details and additional experiment results.
A The DuelGAN Algorithm
Our introduced duel game between two discriminators is inspired by peer prediction mechanism which has shown successful applications in designing robust loss functions [9, 33, 55, 56]. To give a more detailed and practical implementation of the duel game, we summarize the overall DuelGAN algorithm in Algorithm 1. In experiments, we train G to minimize \(\log (1-D_i(G(z)))\) which is equivalent to maximizing \(\log D_i(G(z))\).

B Omitted Proofs
1.1 B.1 Proof of Proposition 1
We firstly introduce Lemma 1 which helps with the proof of Proposition 1.
Lemma 1
For any \((a,b)\in \mathbb {R}^2\setminus \{0, 0\}\), the function \(y\rightarrow a\log (y)+b\log (1-y)\) achieves its maximum in [0, 1] at \(\frac{a}{a+b}\).
Denote by \(f(y):=a\log (y)+b\log (1-y)\), clearly, when \(y=0\) or \(y=1\), \(f(y)=-\infty \). For \(y\in (0,1)\), we have:
Note that \(f'(y)>0\) if \(0<y<\frac{a}{a+b}\) and \(f'(y)<0\) if \(1>y>\frac{a}{a+b}\). Thus, the maximum of f(y) should be \(\max (f(a), f(\frac{a}{a+b}), f(b))=f(\frac{a}{a+b})\). And f(y) achieves its maximum in [0, 1] at \(\frac{a}{a+b}\).
Now we proceed to prove Proposition 1.
Proof of Proposition 1
The trainer criterion for the discriminator \(D_i\), given any generator G, is to maximize the quantity \(\mathcal {L}(D_1, D_2,G)\). Remember that:
We then have:
For \(D_1, D_2\), according to Lemma 1, the above objective function respectively achieves its maximum in [0, 1], [0, 1] at:
With the introduce of Duel Game, the distributions \(p_{\text {data}}\) and \(p_{g}\) in the Vanilla GAN got changed due to the appearance of \(p_{\text {duel}}\). Thus, we define the corresponding updated distributions in DuelGAN w.r.t. discriminator \(D_i\) as \(p_{\text {data}_i}\) and \(p_{g_i}\), respectively:
1.2 B.2 Proof of Theorem 1
When \(\alpha =0, r_{j,G}(x)=1/2\), for \(\alpha = 0, i=1,2\), we have:
This allows us to rewrite C(G)/2 as:
\(\Longrightarrow \) Note that \(2\cdot \big (\mathbb E_{x\thicksim p_{\text {data}_i}}[-\log 2]+\mathbb E_{x\thicksim p_{g_i}}[-\log 2]\big )=-\log {16}\), by subtracting this expression from C(G), we have:
where KL is the Kullback-Leibler divergence. Note that:
and the Jensen-Shannon divergence between two distributions is always non-negative and zero only when they are equal, we have shown that \(C(G)^*=-\log {16}\) is the global minimum of C(G). Thus, we need
\(\Longleftarrow \) Given that \(p_{\text {data}}=p_{g}\), we have:
1.3 B.3 Proof of Theorem 4
Since for the Vanilla GAN, there exists a set S in the instance domain \(\mathcal {X}\) such that \(p_{\text {data}}(S)\ge \delta \), \(p_{\text {g}}(S)\le \epsilon \). For the set S given by the \((\epsilon , \delta )\)-mode collapse of Vanilla GAN, we have:
For either discriminator \(D_i\) in the DuelGAN, we have:
We then have:
1.4 B.4 Proof of Theorem 3
Ignoring the weight \(\beta \), the duel term of discriminator \(D_i\) w.r.t. its diverged peer discriminator \(\widetilde{D}_{j}\) becomes:
where we define:
for a clear presentation. Proceeding the previous deduction, we then have:
Note that:
Thus, given \(\alpha =1\), the Bias term is cancelled out. When \(e_{\text {data}, j}+e_{g, j}<1\), we have:
and we further have:
1.5 B.5 Proof of Theorem 4
When \(\beta =0\), the overall min-max game becomes:
Since we assume enough capacity, the inner max game is achieved if and only if: \(D_1(x)=D_2(x)=\frac{p_{\text {data}}(x)}{p_{\text {data}}(x)+p_g(x)}\). To prove \(p_g\) converges to \(p_{\text {data}}\), only need to reproduce the proof of proposition 2 in [16]. We omit the details here.
C Experiment Details and Additional Results
Model Architectures. For the small-scale datasets, we used a shallow version of generator and discriminator: three convolution layers in the generator and four layers in the discriminators. We use a deep version of generator and discriminator for natural scene and human face image generation, which have three convolution layers in the generator and seven layers in the discriminators. The deep version is the original design of DCGAN [45]. The peer discriminator uses the duplicate version of the first one.
1.1 C.1 Architecture Comparison Between GAN, D2GAN and DuelGAN
Fig. 5 shows the architecture designs of single discriminator, dual discriminator, and our proposed DuelGAN. Compared with Vanilla GAN, DuelGAN has one more identical discriminator and a competitive Duel Game between two discriminators. The introduced Duel Game induces diversified generated samples by discouraging the agreement between \(D_1\) and \(D_2\). In D2GAN, although both discriminators are trained with different loss functions, they do not interfere with each other in the training.
1.2 C.2 Additional Experiment Results
StyleGAN-ADA [24] is the state-of-the-art method in image generation. We applied our duel game to StyleGAN-ADA and further improves its performance. On CelebA [34] dataset, we improved FID from 4.85 to 4.52, and FFHQ-10k [25] dataset improved FID from 7.24 to 6.01. We show the generated image results (trained on CelebA) in Fig. 6.
1.3 C.3 Additional Experiment Details
Model Architectures. For the small-scale datasets, we used a shallow version of generator and discriminator: three convolution layers in the generator and four layers in the discriminators. We use a deep version of generator and discriminator for natural scene and human face image generation, which have three convolution layers in the generator and seven layers in the discriminators. The deep version is the original design of DCGAN [45]. The peer discriminator uses the duplicate version of the first one.
Hyper-parameters. DuelGAN achieves low FID scores when \(\alpha \) and \(\beta \) are simply set to constant values. However we found that we could obtain an approximately 10% improvement through dynamic tuning. The parameter \(\beta \) controls the overall weight of \(\text {Duel-D}\), while \(\alpha \) punishes the condition when \(D_1\) over-agrees with \(D_2\). In the early training phase when we have an unstable generator and discriminator, we set \(\alpha \) and \(\beta \) to 0. As training progresses, we gradually increase these parameters to a max value, which helps with vanishing gradients. After the midpoint of training we decrease these parameters to help the discriminators converge, until the parameters reach approximately 0 at the end of the training process. We adopt 0.3, 0.5 as the max value for \(\alpha \) and \(\beta \), respectively.
1.4 C.4 Ablation Study of DuelGAN
During training, We initialize the \(\alpha \) and \(\beta \) as 0, and gradually increase to the set maximum value. We experimentally discover \(\alpha \) = 0.3 and \(\beta \) = 0.5 can achieve the best FID score in the datasets we tested on. Table 3 shows an thorough ablation of different hyper-parameter setting on STL-10 dataset. The bold text are the best \(\alpha \) setting when beta is fixed.
1.5 C.5 Stability of Training
Now we empirically show the stability of DuelGAN training procedure. We adopt STL-10 dataset and \(\beta =0.25\) for illustration. In Fig. 8 and 9, we visualize the loss of two discriminators during the training procedure of STL-10 dataset. The red lines indicate the smoothed trend of the loss evaluated on the generated images and real images. Real losses are represented by the shaded red lines. Although there exists certain unstable episodes (the difference between smoothed loss and the real loss is large) for both discriminators, the overall trend of both discriminators are stable. What is more, we do observe that \(D_1\) and \(D_2\) hardly experience unstable episodes at the same time. This phenomenon further validates our conclusion in Theorem 2: an unstable/diverged discriminator hardly disrupts the training of its peer discriminator!
Agreements Between Two Discriminators. We also empirically estimate the agreement level between two discriminators while training. In Fig. 10, the \(y-\)axis denotes the percentage of predictions that reach a consensus by \(D_1\) and \(D_2\). The smoothed curve depicts the overall change of the agreement level. At the initial stage, \(D_i\) is not encouraged to agree overly on its peer discriminator \(D_j\). As the training progresses, the agreement level gradually increases to a high value to help the convergence. The shaded red line means that the practical agreement level fluctuates around the smoothed line, incurs a certain degree of randomness and prevents discriminators from getting stuck in a local optimum.
Wei, J., Liu, M., Luo, J., Zhu, A., Davis, J., Liu, Y. (2022). DuelGAN: A Duel Between Two Discriminators Stabilizes the GAN Training. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_18
