Abstract
In this paper, we introduce DuelGAN, a generative adversarial network (GAN) solution to improve the stability of the generated samples and to mitigate mode collapse. Built upon the Vanilla GAN’s two-player game between the discriminator \(D_1\) and the generator G, we introduce a peer discriminator \(D_2\) to the min-max game. Similar to previous work using two discriminators, the first role of both \(D_1\), \(D_2\) is to distinguish between generated samples and real ones, while the generator tries to generate high-quality samples which are able to fool both discriminators. Different from existing methods, we introduce a duel between \(D_1\) and \(D_2\) to discourage their agreement and therefore increase the level of diversity of the generated samples. This property alleviates the issue of early mode collapse by preventing \(D_1\) and \(D_2\) from converging too fast. We provide theoretical analysis for the equilibrium of the min-max game formed among \(G,D_1,D_2\). We offer convergence behavior of DuelGAN as well as stability of the min-max game. It’s worth mentioning that DuelGAN operates in the unsupervised setting, and the duel between \(D_1\) and \(D_2\) does not need any label supervision. Experiments results on a synthetic dataset and on real-world image datasets (MNIST, Fashion MNIST, CIFAR-10, STL-10, CelebA, VGG) demonstrate that DuelGAN outperforms competitive baseline work in generating diverse and high-quality samples, while only introduces negligible computation cost. Our code is publicly available at https://github.com/UCSC-REAL/DuelGAN.
J. Wei and M. Liu—Equal contributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albuquerque, I., Monteiro, J., Doan, T., Considine, B., Falk, T., Mitliagkas, I.: Multi-objective training of generative adversarial networks with multiple discriminators. arXiv preprint arXiv:1901.08680 (2019)
Aneja, J., Schwing, A., Kautz, J., Vahdat, A.: A contrastive learning approach for training variational autoencoder priors. In: Advances in Neural Information Processing Systems vol. 34 (2021)
Antipov, G., Baccouche, M., Dugelay, J.L.: Face aging with conditional generative adversarial networks. In: 2017 IEEE international conference on image processing (ICIP), pp. 2089–2093. IEEE (2017)
Arbel, M., Sutherland, D., Bińkowski, M., Gretton, A.: On gradient regularizers for mmd gans. In: Advances in neural information processing systems, pp. 6700–6710 (2018)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)
Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: A dataset for recognising faces across pose and age. In: International Conference on Automatic Face and Gesture Recognition (2018)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp. 2172–2180 (2016)
Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., Liu, Y.: Learning with instance-dependent label noise: A sample sieve approach. In: International Conference on Learning Representations (2021), https://openreview.net/forum?id=2VXyy9mIyU3
Coates, A., Ng, A., Lee, H.: An Analysis of Single Layer Networks in Unsupervised Feature Learning. In: AISTATS (2011). https://cs.stanford.edu/acoates/papers/coatesleeng_aistats_011.pdf
Dash, A., Gamboa, J.C.B., Ahmed, S., Liwicki, M., Afzal, M.Z.: Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412 (2017)
Dieng, A.B., Ruiz, F.J., Blei, D.M., Titsias, M.K.: Prescribed generative adversarial networks. arXiv preprint arXiv:1910.04302 (2019)
Durugkar, I., Gemp, I., Mahadevan, S.: Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673 (2016)
Ghosh, A., Kulharia, V., Namboodiri, V.P., Torr, P.H., Dokania, P.K.: Multi-agent diverse generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8513–8521 (2018)
Gong, X., Chang, S., Jiang, Y., Wang, Z.: Autogan: Neural architecture search for generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3224–3234 (2019)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
Grassucci, E., Cicero, E., Comminiello, D.: Quaternion generative adversarial networks. arXiv preprint arXiv:2104.09630 (2021)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Advances in neural information processing systems, pp. 5767–5777 (2017)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp. 6626–6637 (2017)
Hoang, Q., Nguyen, T.D., Le, T., Phung, D.: Multi-generator generative adversarial nets. arXiv preprint arXiv:1708.02556 (2017)
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5077–5086 (2017)
Jin, Y., Zhang, J., Li, M., Tian, Y., Zhu, H., Fang, Z.: Towards the automatic anime characters creation with generative adversarial networks. arXiv preprint arXiv:1708.05509 (2017)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. Adv. Neural. Inf. Process. Syst. 33, 12104–12114 (2020)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of gans. arXiv preprint arXiv:1705.07215 (2017)
Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar-10. Unpublished Manuscript 40(7), 1–9 (2010)
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Li, X., Lin, C., Li, R., Wang, C., Guerin, F.: Latent space factorisation and manipulation via matrix subspace projection. In: International Conference on Machine Learning, pp. 5916–5926. PMLR (2020)
Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3911–3919 (2017)
Lin, Z., Khetan, A., Fanti, G., Oh, S.P.: The power of two samples in generative adversarial networks. arxiv 2017. arXiv preprint arXiv:1712.04086
Liu, Y., Guo, H.: Peer loss functions: Learning from noisy labels without knowing noise rates. In: International Conference on Machine learning, pp. 6226–6236. PMLR (2020)
Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retrieved 11 August (2018)
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Advances in Neural Information Processing Systems, pp. 406–416 (2017)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Metz, L., Poole, B., Pfau, D., Sohl-Dickstein, J.: Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163 (2016)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Mordido, G., Yang, H., Meinel, C.: microbatchgan: Stimulating diversity with multi-adversarial discrimination. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3061–3070 (2020)
Nguyen, T., Le, T., Vu, H., Phung, D.: Dual discriminator generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2670–2680 (2017)
Nowozin, S., Cseke, B., Tomioka, R.: f-gan: Training generative neural samplers using variational divergence minimization. In: Advances in neural information processing systems, pp. 271–279 (2016)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Perarnau, G., Van De Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional gans for image editing. arXiv preprint arXiv:1611.06355 (2016)
Qi, G.J.: Loss-sensitive generative adversarial networks on lipschitz densities. Int. J. Comput. Vision 128(5), 1118–1140 (2020)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. arXiv preprint arXiv:1606.03498 (2016)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems, pp. 11918–11930 (2019)
Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200 (2016)
Tran, N.-T., Bui, T.-A., Cheung, N.-M.: Dist-GAN: an improved GAN using distance constraints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 387–401. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_23
Tran, N.T., Bui, T.A., Cheung, N.M.: Dist-gan: An improved gan using distance constraints. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 370–385 (2018)
Tran, N.T., Tran, V.H., Nguyen, B.N., Yang, L., Cheung, N.M.M.: Self-supervised gan: analysis and improvement with multi-class minimax game. Adv. Neural. Inf. Process. Syst. 32, 13253–13264 (2019)
Vahdat, A., Kreis, K., Kautz, J.: Score-based generative modeling in latent space. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in neural information processing systems, pp. 613–621 (2016)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Wei, J., Liu, H., Liu, T., Niu, G., Liu, Y.: Understanding generalized label smoothing when learning with noisy labels. arXiv preprint arXiv:2106.04149 (2021)
Wei, J., Liu, Y.: When optimizing \( f \)-divergence is robust with label noise. arXiv preprint arXiv:2011.03687 (2020)
Wiatrak, M., Albrecht, S.V., Nystrom, A.: Stabilizing generative adversarial networks: A survey. arXiv preprint arXiv:1910.00927 (2019)
Wu, H., Zheng, S., Zhang, J., Huang, K.: Gp-gan: Towards realistic high-resolution image blending. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2487–2495 (2019)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Xiao, Z., Yan, Q., Amit, Y.: Generative latent flow. arXiv preprint arXiv:1905.10485 (2019)
Yeh, R.A., Chen, C., Yian Lim, T., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5485–5493 (2017)
Zhang, H., et al.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Zhang, H., Zhang, Z., Odena, A., Lee, H.: Consistency regularization for generative adversarial networks. arXiv preprint arXiv:1910.12027 (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Acknowledgements
JHW and YL are partially supported by the National Science Foundation (NSF) under grants IIS-2007951, IIS-2143895, and CCF-2023495. MHL, JHL, and JD are supported in part by WISEautomotive through a ATC+ Program award from the Korean Ministry of Trade, Industry and Energy (MOTIE).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix
The appendix is organized as follows:
-
Sect. A gives the detailed algorithm of DuelGAN.
-
Sect. B includes the omitted proofs for all theoretical conclusions in the main paper.
-
Sect. C includes experiment details and additional experiment results.
A The DuelGAN Algorithm
Our introduced duel game between two discriminators is inspired by peer prediction mechanism which has shown successful applications in designing robust loss functions [9, 33, 55, 56]. To give a more detailed and practical implementation of the duel game, we summarize the overall DuelGAN algorithm in Algorithm 1. In experiments, we train G to minimize \(\log (1-D_i(G(z)))\) which is equivalent to maximizing \(\log D_i(G(z))\).

B Omitted Proofs
1.1 B.1 Proof of Proposition 1
We firstly introduce Lemma 1 which helps with the proof of Proposition 1.
Lemma 1
For any \((a,b)\in \mathbb {R}^2\setminus \{0, 0\}\), the function \(y\rightarrow a\log (y)+b\log (1-y)\) achieves its maximum in [0, 1] at \(\frac{a}{a+b}\).
Proof
Denote by \(f(y):=a\log (y)+b\log (1-y)\), clearly, when \(y=0\) or \(y=1\), \(f(y)=-\infty \). For \(y\in (0,1)\), we have:
Note that \(f'(y)>0\) if \(0<y<\frac{a}{a+b}\) and \(f'(y)<0\) if \(1>y>\frac{a}{a+b}\). Thus, the maximum of f(y) should be \(\max (f(a), f(\frac{a}{a+b}), f(b))=f(\frac{a}{a+b})\). And f(y) achieves its maximum in [0, 1] at \(\frac{a}{a+b}\).
Now we proceed to prove Proposition 1.
Proof of Proposition 1
Proof
The trainer criterion for the discriminator \(D_i\), given any generator G, is to maximize the quantity \(\mathcal {L}(D_1, D_2,G)\). Remember that:
We then have:
For \(D_1, D_2\), according to Lemma 1, the above objective function respectively achieves its maximum in [0, 1], [0, 1] at:
With the introduce of Duel Game, the distributions \(p_{\text {data}}\) and \(p_{g}\) in the Vanilla GAN got changed due to the appearance of \(p_{\text {duel}}\). Thus, we define the corresponding updated distributions in DuelGAN w.r.t. discriminator \(D_i\) as \(p_{\text {data}_i}\) and \(p_{g_i}\), respectively:
1.2 B.2 Proof of Theorem 1
Proof
When \(\alpha =0, r_{j,G}(x)=1/2\), for \(\alpha = 0, i=1,2\), we have:
This allows us to rewrite C(G)/2 as:
\(\Longrightarrow \) Note that \(2\cdot \big (\mathbb E_{x\thicksim p_{\text {data}_i}}[-\log 2]+\mathbb E_{x\thicksim p_{g_i}}[-\log 2]\big )=-\log {16}\), by subtracting this expression from C(G), we have:
where KL is the Kullback-Leibler divergence. Note that:
and the Jensen-Shannon divergence between two distributions is always non-negative and zero only when they are equal, we have shown that \(C(G)^*=-\log {16}\) is the global minimum of C(G). Thus, we need
\(\Longleftarrow \) Given that \(p_{\text {data}}=p_{g}\), we have:
1.3 B.3 Proof of Theorem 4
Proof
Since for the Vanilla GAN, there exists a set S in the instance domain \(\mathcal {X}\) such that \(p_{\text {data}}(S)\ge \delta \), \(p_{\text {g}}(S)\le \epsilon \). For the set S given by the \((\epsilon , \delta )\)-mode collapse of Vanilla GAN, we have:
For either discriminator \(D_i\) in the DuelGAN, we have:
We then have:
And:
1.4 B.4 Proof of Theorem 3
Proof
Ignoring the weight \(\beta \), the duel term of discriminator \(D_i\) w.r.t. its diverged peer discriminator \(\widetilde{D}_{j}\) becomes:
where we define:
for a clear presentation. Proceeding the previous deduction, we then have:
Thus,
Note that:
Thus, given \(\alpha =1\), the Bias term is cancelled out. When \(e_{\text {data}, j}+e_{g, j}<1\), we have:
and we further have:
1.5 B.5 Proof of Theorem 4
Proof
When \(\beta =0\), the overall min-max game becomes:
Since we assume enough capacity, the inner max game is achieved if and only if: \(D_1(x)=D_2(x)=\frac{p_{\text {data}}(x)}{p_{\text {data}}(x)+p_g(x)}\). To prove \(p_g\) converges to \(p_{\text {data}}\), only need to reproduce the proof of proposition 2 in [16]. We omit the details here.
C Experiment Details and Additional Results
Model Architectures. For the small-scale datasets, we used a shallow version of generator and discriminator: three convolution layers in the generator and four layers in the discriminators. We use a deep version of generator and discriminator for natural scene and human face image generation, which have three convolution layers in the generator and seven layers in the discriminators. The deep version is the original design of DCGAN [45]. The peer discriminator uses the duplicate version of the first one.
1.1 C.1 Architecture Comparison Between GAN, D2GAN and DuelGAN
Fig. 5 shows the architecture designs of single discriminator, dual discriminator, and our proposed DuelGAN. Compared with Vanilla GAN, DuelGAN has one more identical discriminator and a competitive Duel Game between two discriminators. The introduced Duel Game induces diversified generated samples by discouraging the agreement between \(D_1\) and \(D_2\). In D2GAN, although both discriminators are trained with different loss functions, they do not interfere with each other in the training.
1.2 C.2 Additional Experiment Results
StyleGAN-ADA [24] is the state-of-the-art method in image generation. We applied our duel game to StyleGAN-ADA and further improves its performance. On CelebA [34] dataset, we improved FID from 4.85 to 4.52, and FFHQ-10k [25] dataset improved FID from 7.24 to 6.01. We show the generated image results (trained on CelebA) in Fig. 6.
1.3 C.3 Additional Experiment Details
Model Architectures. For the small-scale datasets, we used a shallow version of generator and discriminator: three convolution layers in the generator and four layers in the discriminators. We use a deep version of generator and discriminator for natural scene and human face image generation, which have three convolution layers in the generator and seven layers in the discriminators. The deep version is the original design of DCGAN [45]. The peer discriminator uses the duplicate version of the first one.
Hyper-parameters. DuelGAN achieves low FID scores when \(\alpha \) and \(\beta \) are simply set to constant values. However we found that we could obtain an approximately 10% improvement through dynamic tuning. The parameter \(\beta \) controls the overall weight of \(\text {Duel-D}\), while \(\alpha \) punishes the condition when \(D_1\) over-agrees with \(D_2\). In the early training phase when we have an unstable generator and discriminator, we set \(\alpha \) and \(\beta \) to 0. As training progresses, we gradually increase these parameters to a max value, which helps with vanishing gradients. After the midpoint of training we decrease these parameters to help the discriminators converge, until the parameters reach approximately 0 at the end of the training process. We adopt 0.3, 0.5 as the max value for \(\alpha \) and \(\beta \), respectively.
1.4 C.4 Ablation Study of DuelGAN
During training, We initialize the \(\alpha \) and \(\beta \) as 0, and gradually increase to the set maximum value. We experimentally discover \(\alpha \) = 0.3 and \(\beta \) = 0.5 can achieve the best FID score in the datasets we tested on. Table 3 shows an thorough ablation of different hyper-parameter setting on STL-10 dataset. The bold text are the best \(\alpha \) setting when beta is fixed.
1.5 C.5 Stability of Training
Now we empirically show the stability of DuelGAN training procedure. We adopt STL-10 dataset and \(\beta =0.25\) for illustration. In Fig. 8 and 9, we visualize the loss of two discriminators during the training procedure of STL-10 dataset. The red lines indicate the smoothed trend of the loss evaluated on the generated images and real images. Real losses are represented by the shaded red lines. Although there exists certain unstable episodes (the difference between smoothed loss and the real loss is large) for both discriminators, the overall trend of both discriminators are stable. What is more, we do observe that \(D_1\) and \(D_2\) hardly experience unstable episodes at the same time. This phenomenon further validates our conclusion in Theorem 2: an unstable/diverged discriminator hardly disrupts the training of its peer discriminator!
Agreements Between Two Discriminators. We also empirically estimate the agreement level between two discriminators while training. In Fig. 10, the \(y-\)axis denotes the percentage of predictions that reach a consensus by \(D_1\) and \(D_2\). The smoothed curve depicts the overall change of the agreement level. At the initial stage, \(D_i\) is not encouraged to agree overly on its peer discriminator \(D_j\). As the training progresses, the agreement level gradually increases to a high value to help the convergence. The shaded red line means that the practical agreement level fluctuates around the smoothed line, incurs a certain degree of randomness and prevents discriminators from getting stuck in a local optimum.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, J., Liu, M., Luo, J., Zhu, A., Davis, J., Liu, Y. (2022). DuelGAN: A Duel Between Two Discriminators Stabilizes the GAN Training. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-20050-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20049-6
Online ISBN: 978-3-031-20050-2
eBook Packages: Computer ScienceComputer Science (R0)