Skip to main content

DuelGAN: A Duel Between Two Discriminators Stabilizes the GAN Training

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

In this paper, we introduce DuelGAN, a generative adversarial network (GAN) solution to improve the stability of the generated samples and to mitigate mode collapse. Built upon the Vanilla GAN’s two-player game between the discriminator \(D_1\) and the generator G, we introduce a peer discriminator \(D_2\) to the min-max game. Similar to previous work using two discriminators, the first role of both \(D_1\), \(D_2\) is to distinguish between generated samples and real ones, while the generator tries to generate high-quality samples which are able to fool both discriminators. Different from existing methods, we introduce a duel between \(D_1\) and \(D_2\) to discourage their agreement and therefore increase the level of diversity of the generated samples. This property alleviates the issue of early mode collapse by preventing \(D_1\) and \(D_2\) from converging too fast. We provide theoretical analysis for the equilibrium of the min-max game formed among \(G,D_1,D_2\). We offer convergence behavior of DuelGAN as well as stability of the min-max game. It’s worth mentioning that DuelGAN operates in the unsupervised setting, and the duel between \(D_1\) and \(D_2\) does not need any label supervision. Experiments results on a synthetic dataset and on real-world image datasets (MNIST, Fashion MNIST, CIFAR-10, STL-10, CelebA, VGG) demonstrate that DuelGAN outperforms competitive baseline work in generating diverse and high-quality samples, while only introduces negligible computation cost. Our code is publicly available at https://github.com/UCSC-REAL/DuelGAN.

J. Wei and M. Liu—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Albuquerque, I., Monteiro, J., Doan, T., Considine, B., Falk, T., Mitliagkas, I.: Multi-objective training of generative adversarial networks with multiple discriminators. arXiv preprint arXiv:1901.08680 (2019)

  2. Aneja, J., Schwing, A., Kautz, J., Vahdat, A.: A contrastive learning approach for training variational autoencoder priors. In: Advances in Neural Information Processing Systems vol. 34 (2021)

    Google Scholar 

  3. Antipov, G., Baccouche, M., Dugelay, J.L.: Face aging with conditional generative adversarial networks. In: 2017 IEEE international conference on image processing (ICIP), pp. 2089–2093. IEEE (2017)

    Google Scholar 

  4. Arbel, M., Sutherland, D., Bińkowski, M., Gretton, A.: On gradient regularizers for mmd gans. In: Advances in neural information processing systems, pp. 6700–6710 (2018)

    Google Scholar 

  5. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)

  6. Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)

  7. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: A dataset for recognising faces across pose and age. In: International Conference on Automatic Face and Gesture Recognition (2018)

    Google Scholar 

  8. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp. 2172–2180 (2016)

    Google Scholar 

  9. Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., Liu, Y.: Learning with instance-dependent label noise: A sample sieve approach. In: International Conference on Learning Representations (2021), https://openreview.net/forum?id=2VXyy9mIyU3

  10. Coates, A., Ng, A., Lee, H.: An Analysis of Single Layer Networks in Unsupervised Feature Learning. In: AISTATS (2011). https://cs.stanford.edu/acoates/papers/coatesleeng_aistats_011.pdf

  11. Dash, A., Gamboa, J.C.B., Ahmed, S., Liwicki, M., Afzal, M.Z.: Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412 (2017)

  12. Dieng, A.B., Ruiz, F.J., Blei, D.M., Titsias, M.K.: Prescribed generative adversarial networks. arXiv preprint arXiv:1910.04302 (2019)

  13. Durugkar, I., Gemp, I., Mahadevan, S.: Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673 (2016)

  14. Ghosh, A., Kulharia, V., Namboodiri, V.P., Torr, P.H., Dokania, P.K.: Multi-agent diverse generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8513–8521 (2018)

    Google Scholar 

  15. Gong, X., Chang, S., Jiang, Y., Wang, Z.: Autogan: Neural architecture search for generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3224–3234 (2019)

    Google Scholar 

  16. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)

    Google Scholar 

  17. Grassucci, E., Cicero, E., Comminiello, D.: Quaternion generative adversarial networks. arXiv preprint arXiv:2104.09630 (2021)

  18. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein gans. In: Advances in neural information processing systems, pp. 5767–5777 (2017)

    Google Scholar 

  19. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, pp. 6626–6637 (2017)

    Google Scholar 

  20. Hoang, Q., Nguyen, T.D., Le, T., Phung, D.: Multi-generator generative adversarial nets. arXiv preprint arXiv:1708.02556 (2017)

  21. Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5077–5086 (2017)

    Google Scholar 

  22. Jin, Y., Zhang, J., Li, M., Tian, Y., Zhu, H., Fang, Z.: Towards the automatic anime characters creation with generative adversarial networks. arXiv preprint arXiv:1708.05509 (2017)

  23. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

  24. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. Adv. Neural. Inf. Process. Syst. 33, 12104–12114 (2020)

    Google Scholar 

  25. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

    Google Scholar 

  26. Kodali, N., Abernethy, J., Hays, J., Kira, Z.: On convergence and stability of gans. arXiv preprint arXiv:1705.07215 (2017)

  27. Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar-10. Unpublished Manuscript 40(7), 1–9 (2010)

    Google Scholar 

  28. LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/

  29. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)

    Google Scholar 

  30. Li, X., Lin, C., Li, R., Wang, C., Guerin, F.: Latent space factorisation and manipulation via matrix subspace projection. In: International Conference on Machine Learning, pp. 5916–5926. PMLR (2020)

    Google Scholar 

  31. Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3911–3919 (2017)

    Google Scholar 

  32. Lin, Z., Khetan, A., Fanti, G., Oh, S.P.: The power of two samples in generative adversarial networks. arxiv 2017. arXiv preprint arXiv:1712.04086

  33. Liu, Y., Guo, H.: Peer loss functions: Learning from noisy labels without knowing noise rates. In: International Conference on Machine learning, pp. 6226–6236. PMLR (2020)

    Google Scholar 

  34. Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale celebfaces attributes (celeba) dataset. Retrieved 11 August (2018)

    Google Scholar 

  35. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Advances in Neural Information Processing Systems, pp. 406–416 (2017)

    Google Scholar 

  36. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)

    Google Scholar 

  37. Metz, L., Poole, B., Pfau, D., Sohl-Dickstein, J.: Unrolled generative adversarial networks. arXiv preprint arXiv:1611.02163 (2016)

  38. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)

  39. Mordido, G., Yang, H., Meinel, C.: microbatchgan: Stimulating diversity with multi-adversarial discrimination. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3061–3070 (2020)

    Google Scholar 

  40. Nguyen, T., Le, T., Vu, H., Phung, D.: Dual discriminator generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2670–2680 (2017)

    Google Scholar 

  41. Nowozin, S., Cseke, B., Tomioka, R.: f-gan: Training generative neural samplers using variational divergence minimization. In: Advances in neural information processing systems, pp. 271–279 (2016)

    Google Scholar 

  42. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)

    Google Scholar 

  43. Perarnau, G., Van De Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional gans for image editing. arXiv preprint arXiv:1611.06355 (2016)

  44. Qi, G.J.: Loss-sensitive generative adversarial networks on lipschitz densities. Int. J. Comput. Vision 128(5), 1118–1140 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  45. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)

  46. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. arXiv preprint arXiv:1606.03498 (2016)

  47. Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems, pp. 11918–11930 (2019)

    Google Scholar 

  48. Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. arXiv preprint arXiv:1611.02200 (2016)

  49. Tran, N.-T., Bui, T.-A., Cheung, N.-M.: Dist-GAN: an improved GAN using distance constraints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 387–401. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_23

    Chapter  Google Scholar 

  50. Tran, N.T., Bui, T.A., Cheung, N.M.: Dist-gan: An improved gan using distance constraints. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 370–385 (2018)

    Google Scholar 

  51. Tran, N.T., Tran, V.H., Nguyen, B.N., Yang, L., Cheung, N.M.M.: Self-supervised gan: analysis and improvement with multi-class minimax game. Adv. Neural. Inf. Process. Syst. 32, 13253–13264 (2019)

    Google Scholar 

  52. Vahdat, A., Kreis, K., Kautz, J.: Score-based generative modeling in latent space. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  53. Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in neural information processing systems, pp. 613–621 (2016)

    Google Scholar 

  54. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

    Google Scholar 

  55. Wei, J., Liu, H., Liu, T., Niu, G., Liu, Y.: Understanding generalized label smoothing when learning with noisy labels. arXiv preprint arXiv:2106.04149 (2021)

  56. Wei, J., Liu, Y.: When optimizing \( f \)-divergence is robust with label noise. arXiv preprint arXiv:2011.03687 (2020)

  57. Wiatrak, M., Albrecht, S.V., Nystrom, A.: Stabilizing generative adversarial networks: A survey. arXiv preprint arXiv:1910.00927 (2019)

  58. Wu, H., Zheng, S., Zhang, J., Huang, K.: Gp-gan: Towards realistic high-resolution image blending. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2487–2495 (2019)

    Google Scholar 

  59. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

  60. Xiao, Z., Yan, Q., Amit, Y.: Generative latent flow. arXiv preprint arXiv:1905.10485 (2019)

  61. Yeh, R.A., Chen, C., Yian Lim, T., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5485–5493 (2017)

    Google Scholar 

  62. Zhang, H., et al.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)

    Google Scholar 

  63. Zhang, H., Zhang, Z., Odena, A., Lee, H.: Consistency regularization for generative adversarial networks. arXiv preprint arXiv:1910.12027 (2019)

  64. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)

    Google Scholar 

Download references

Acknowledgements

JHW and YL are partially supported by the National Science Foundation (NSF) under grants IIS-2007951, IIS-2143895, and CCF-2023495. MHL, JHL, and JD are supported in part by WISEautomotive through a ATC+ Program award from the Korean Ministry of Trade, Industry and Energy (MOTIE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 144 KB)

Appendices

Appendix

The appendix is organized as follows:

  • Sect. A gives the detailed algorithm of DuelGAN.

  • Sect. B includes the omitted proofs for all theoretical conclusions in the main paper.

  • Sect. C includes experiment details and additional experiment results.

A The DuelGAN Algorithm

Our introduced duel game between two discriminators is inspired by peer prediction mechanism which has shown successful applications in designing robust loss functions [9, 33, 55, 56]. To give a more detailed and practical implementation of the duel game, we summarize the overall DuelGAN algorithm in Algorithm 1. In experiments, we train G to minimize \(\log (1-D_i(G(z)))\) which is equivalent to maximizing \(\log D_i(G(z))\).

figure k

B Omitted Proofs

1.1 B.1 Proof of Proposition 1

We firstly introduce Lemma 1 which helps with the proof of Proposition 1.

Lemma 1

For any \((a,b)\in \mathbb {R}^2\setminus \{0, 0\}\), the function \(y\rightarrow a\log (y)+b\log (1-y)\) achieves its maximum in [0, 1] at \(\frac{a}{a+b}\).

Proof

Denote by \(f(y):=a\log (y)+b\log (1-y)\), clearly, when \(y=0\) or \(y=1\), \(f(y)=-\infty \). For \(y\in (0,1)\), we have:

$$\begin{aligned} f'(y)=0\Longleftrightarrow \frac{a}{y}-\frac{b}{1-y}=0 \Longleftrightarrow y=\frac{a}{a+b}. \end{aligned}$$
(18)

Note that \(f'(y)>0\) if \(0<y<\frac{a}{a+b}\) and \(f'(y)<0\) if \(1>y>\frac{a}{a+b}\). Thus, the maximum of f(y) should be \(\max (f(a), f(\frac{a}{a+b}), f(b))=f(\frac{a}{a+b})\). And f(y) achieves its maximum in [0, 1] at \(\frac{a}{a+b}\).

Now we proceed to prove Proposition 1.

Proof of Proposition 1

Proof

The trainer criterion for the discriminator \(D_i\), given any generator G, is to maximize the quantity \(\mathcal {L}(D_1, D_2,G)\). Remember that:

$$\begin{aligned}&\mathcal {L}(D_1, D_2, G)\nonumber \\&= \mathbb E_{x\thicksim p_{\text {data}}}\left[ \log D_1(x)\right] +\mathbb E_{x\thicksim p_{\text {data}}}\left[ \log D_2(x)\right] \nonumber \\&+\mathbb E_{z\thicksim p_{z}}\left[ \log \left( 1-D_1\left( G(z)\right) \right) \right] +\mathbb E_{z\thicksim p_{z}}\left[ \log \left( 1-D_2\left( G(z)\right) \right) \right] \nonumber \\&+\beta \cdot \mathbb E_{x\thicksim p_{\text {duel}}}\Bigg [\ell \Big (D_1(x),\mathbbm {1}\big (D_2(x)>\dfrac{1}{2}\big )\Big )-\alpha \cdot \ell \Big (D_1(x_{p_1}),\mathbbm {1}\big (D_2(x_{p_2})>\dfrac{1}{2}\big )\Big )\Bigg ]\nonumber \\&+\beta \cdot \mathbb E_{x\thicksim p_{\text {duel}}}\Bigg [\ell \Big (D_2(x),\mathbbm {1}\big (D_1(x)>\dfrac{1}{2}\big )\Big )-\alpha \cdot \ell \Big (D_2(x_{p_1}),\mathbbm {1}\big (D_1(x_{p_2})>\dfrac{1}{2}\big )\Big )\Bigg ]. \end{aligned}$$
(19)

We then have:

$$\begin{aligned} \begin{aligned} \text {Eqn.}(19)&= \int _{x}p_{\text {data}}(x)\big [\log \big (D_1(x)\big )+\log \big (D_2(x)\big )\big ]dx\\&+\int _{z}p_{z}(z)\Big [\log \Big (1-D_1\big (G(z)\big )\Big )+\log \Big (1-D_2\big (G(z)\big )\Big )\Big ]dz\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (r_{2,G}(x)-\alpha \cdot p_{2,G}\big )\cdot \log \big (D_1(x)\big )dx\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (r_{1,G}(x)-\alpha \cdot p_{1,G}\big )\cdot \log \big (D_2(x)\big )\big ]dx\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (1-\alpha -r_{2,G}(x)+\alpha \cdot p_{2,G}\big )\cdot \log \big (1-D_1(x)\big )dx\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (1-\alpha -r_{1,G}(x)+\alpha \cdot p_{1,G}\big )\cdot \log \big (1-D_2(x)\big )dx\\&= \int _{x}p_{\text {data}}(x)\big [\log \big (D_1(x)\big )+\log \big (D_2(x)\big )\big ]dx\\&+\int _{x}p_{g}(x)\big [\log \big (1-D_1(x)\big )+\log \big (1-D_2(x)\big )\big ]dx\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (r_{2,G}(x)-\alpha \cdot p_{2,G}\big )\cdot \log \big (D_1(x)\big )dx\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (r_{1,G}(x)-\alpha \cdot p_{1,G}\big )\cdot \log \big (D_2(x)\big )dx\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (1-\alpha -r_{2,G}(x)+\alpha \cdot p_{2,G}\big )\cdot \log \big (1-D_1(x)\big )dx\\&+\beta \cdot \int _{x} p_{\text {duel}}(x)\big (1-\alpha -r_{1,G}(x)+\alpha \cdot p_{1,G}\big )\cdot \log \big (1-D_2(x)\big )dx\\&= \int _{x} \big [p_{\text {data}}(x)+\beta \cdot \big (r_{2,G}(x)-\alpha \cdot p_{2,G}\big )\cdot p_{\text {duel}}(x)\big ] \cdot \log \big (D_1(x)\big ) dx\\&+ \int _{x} \big [p_{g}(x)+\beta \cdot \big (1-\alpha -r_{2,G}(x)+\alpha \cdot p_{2,G}\big )\cdot p_{\text {duel}}(x)\big ] \cdot \log (1-D_1(x)) dx\\&+ \int _{x} \big [p_{\text {data}}(x)+\beta \cdot \big (r_{1,G}(x)-\alpha \cdot p_{1,G}\big )\cdot p_{\text {duel}}(x)\big ] \cdot \log \big (D_2(x)\big ) dx\\&+ \int _{x} \big [p_{g}(x)+\beta \cdot \big (1-\alpha -r_{1,G}(x)+\alpha \cdot p_{1,G}\big )\cdot p_{\text {duel}}(x)\big ] \cdot \log \big (1-D_2(x)\big ) dx.\\ \end{aligned} \end{aligned}$$
(20)

For \(D_1, D_2\), according to Lemma 1, the above objective function respectively achieves its maximum in [0, 1], [0, 1] at:

$$\begin{aligned} D_{i,G}^*(x)=\dfrac{p_{\text {data}}(x)+\beta \cdot (r_{j,G}(x)-\alpha \cdot p_{j, G})\cdot p_{\text {duel}}(x)}{p_{\text {data}}(x)+p_{g}(x)+\beta \cdot (1-\alpha )\cdot p_{\text {duel}}(x)}, \qquad i\ne j. \end{aligned}$$
(21)

With the introduce of Duel Game, the distributions \(p_{\text {data}}\) and \(p_{g}\) in the Vanilla GAN got changed due to the appearance of \(p_{\text {duel}}\). Thus, we define the corresponding updated distributions in DuelGAN w.r.t. discriminator \(D_i\) as \(p_{\text {data}_i}\) and \(p_{g_i}\), respectively:

$$\begin{aligned}&p_{\text {data}_i}(x) :=\dfrac{p_{\text {data}}(x)+\beta \cdot \hat{r}^*_{j,G}(x)\cdot p_{\text {duel}}(x)}{\int _{x}p_{\text {data}}(x)+\beta \cdot \hat{r}^*_{j,G}(x)\cdot p_{\text {duel}}(x)dx},\end{aligned}$$
(22)
$$\begin{aligned}&p_{g_i}(x) :=\dfrac{p_{g}(x)+\beta \cdot \big (1-\hat{r}^*_{j,G}(x)\big )\cdot p_{\text {duel}}(x)}{\int _{x}p_{g}(x)+\beta \cdot \big (1-\hat{r}^*_{j,G}(x)\big )\cdot p_{\text {duel}}(x)dx}. \end{aligned}$$
(23)

1.2 B.2 Proof of Theorem 1

Proof

When \(\alpha =0, r_{j,G}(x)=1/2\), for \(\alpha = 0, i=1,2\), we have:

$$\begin{aligned} D_{i,G}^*(x)&=\dfrac{p_{\text {data}}(x)+\beta \cdot \hat{r}_{j,G}^*(x)\cdot p_{\text {duel}}(x)}{p_{\text {data}}(x)+p_{g}(x)+\beta \cdot p_{\text {duel}}(x)}\rightarrow \dfrac{p_{\text {data}}(x)+\dfrac{\beta }{2} \cdot p_{\text {duel}}(x)}{p_{\text {data}}(x)+p_{g}(x)+\beta \cdot p_{\text {duel}}(x)}. \end{aligned}$$
(24)

This allows us to rewrite C(G)/2 as:

$$\begin{aligned} \frac{C(G)}{2}=&\mathbb E_{x\thicksim p_{\text {data}_i}}\left[ \log \frac{p_{\text {data}}(x)+\frac{\beta }{2}\cdot p_{\text {duel}}(x)}{p_{\text {data}}(x)+p_{g}(x)+\beta \cdot p_{\text {duel}}(x)}\right] \nonumber \\ {}&+\mathbb E_{x\thicksim p_{g_i }}\left[ \log \frac{p_{g}(x)+\frac{\beta }{2}\cdot p_{\text {duel}}(x)}{p_{\text {data}}(x)+p_{g}(x)+\beta \cdot p_{\text {duel}}(x)}\right] . \end{aligned}$$
(25)

\(\Longrightarrow \) Note that \(2\cdot \big (\mathbb E_{x\thicksim p_{\text {data}_i}}[-\log 2]+\mathbb E_{x\thicksim p_{g_i}}[-\log 2]\big )=-\log {16}\), by subtracting this expression from C(G), we have:

$$\begin{aligned} C(G)=&-\log {16}+ 2\cdot KL\Big (p_{g}+ \dfrac{\beta }{2}\cdot p_{\text {duel}}\Big |\Big |\dfrac{p_{\text {data}}+p_{g}+\beta \cdot p_{\text {duel}}}{2}\Big )\nonumber \\&+2\cdot KL\Big (p_{\text {data}}+\dfrac{\beta }{2}\cdot p_{\text {duel}}\Big |\Big |\dfrac{p_{\text {data}}+p_{g}+\beta \cdot p_{\text {duel}}}{2}\Big ), \end{aligned}$$
(26)

where KL is the Kullback-Leibler divergence. Note that:

$$\begin{aligned} C(G)=-\log {16}+2\cdot JSD\Big (p_{\text {data}}+\dfrac{\beta }{2}\cdot p_{\text {duel}}\Big |\Big |p_{g}+\dfrac{\beta }{2}\cdot p_{\text {duel}}\Big ), \end{aligned}$$
(27)

and the Jensen-Shannon divergence between two distributions is always non-negative and zero only when they are equal, we have shown that \(C(G)^*=-\log {16}\) is the global minimum of C(G). Thus, we need

$$p_{\text {data}}+\dfrac{\beta }{2}\cdot p_{\text {duel}}=p_{g}+\dfrac{\beta }{2}\cdot p_{\text {duel}}\Leftrightarrow p_{\text {data}}=p_g.$$

\(\Longleftarrow \) Given that \(p_{\text {data}}=p_{g}\), we have:

$$\begin{aligned} C(G)=&\max _{D}\mathcal {L}(G, D_1, D_2)\nonumber \\ =&2\cdot \mathbb E_{x\thicksim p_{\text {data}_i}}\left[ \log \dfrac{p_{\text {data}}(x)+\dfrac{\beta }{2}\cdot p_{\text {duel}}(x)}{p_{\text {data}}(x)+p_{g}(x)+\beta \cdot p_{\text {duel}}(x)}\right] \nonumber \\&+2\cdot \mathbb E_{x\thicksim p_{g_i}}\left[ \log \dfrac{p_{g}(x)+\dfrac{\beta }{2}\cdot p_{\text {duel}}(x)}{p_{\text {data}}(x)+p_{g}(x)+\beta \cdot p_{\text {duel}}(x)}\right] \nonumber \\ =&2\cdot \left( \log \dfrac{1}{2}+\log \dfrac{1}{2}\right) =-\log {16}. \end{aligned}$$
(28)

1.3 B.3 Proof of Theorem 4

Proof

Since for the Vanilla GAN, there exists a set S in the instance domain \(\mathcal {X}\) such that \(p_{\text {data}}(S)\ge \delta \), \(p_{\text {g}}(S)\le \epsilon \). For the set S given by the \((\epsilon , \delta )\)-mode collapse of Vanilla GAN, we have:

$$\begin{aligned} 1\ge \mathbb {E}_{x\in S}~ [p_{\text {data}}(x)] \ge \delta , \qquad 0\le \mathbb {E}_{x\in S} ~ [p_{\text {g}}(x)] \le \epsilon . \end{aligned}$$

For either discriminator \(D_i\) in the DuelGAN, we have:

$$\begin{aligned}&p_{\text {data}_i}(x) :=\dfrac{p_{\text {data}}(x)+\beta \cdot \hat{r}^*_{j,G}(x)\cdot p_{\text {duel}}(x)}{\int _{x}p_{\text {data}}(x)+\beta \cdot \hat{r}^*_{j,G}(x)\cdot p_{\text {duel}}(x)dx},\\&p_{\text {g}_i}(x) :=\dfrac{p_{\text {g}}(x)+\beta \cdot (1-\hat{r}^*_{j,G}(x))\cdot p_{\text {duel}}(x)}{\int _{x}p_{\text {g}}(x)+\beta \cdot (1-\hat{r}^*_{j,G}(x))\cdot p_{\text {duel}}(x)dx}. \end{aligned}$$

We then have:

$$\begin{aligned}&\mathbb {E}_{x\in S}~ [p_{\text {data}_i}(x)]=\,\mathbb {E}_{x\in S}~ [c_{i,1}(x)\cdot p_{\text {data}}(x)]+ \mathbb {E}_{x\in S}~ [(1-c_{i,1}(x))\cdot p_{\text {g}}(x)]\\ =\,&\mathbb {E}_{x\in S}~ [ c_{i, 1}(x)] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {data}}(x)]+\text {Cov}(c_{i, 1}, p_{\text {data}})\\&~~+\mathbb {E}_{x\in S}~ [ (1-c_{i, 1}(x))] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {g}}(x)]-\text {Cov}(c_{i, 1}, p_{\text {g}})\\ =\,&\mathbb {E}_{x\in S}~ [ c_{i, 1}(x)] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {data}}(x)]+\mathbb {E}_{x\in S}~ [ (1-c_{i, 1}(x))] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {g}}(x)]+\text {Cov}(c_{i, 1}, p_{\text {data}}-p_{\text {g}})\\ >\,&\mathbb {E}_{x\in S}~ [ c_{i, 1}(x)] \cdot \delta +\mathbb {E}_{x\in S}~ [ (1-c_{i, 1}(x))] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {g}}(x)]+\text {Cov}(c_{i, 1}, p_{\text {data}}-p_{\text {g}}). \end{aligned}$$

And:

$$\begin{aligned}&\mathbb {E}_{x\in S}~ [p_{\text {g}_i}(x)]=\,\mathbb {E}_{x\in S}~ [c_{i,2}(x)\cdot p_{\text {g}}(x)]+ \mathbb {E}_{x\in S}~ [(1-c_{i,2}(x))\cdot p_{\text {data}}(x)]\\ =\,&\mathbb {E}_{x\in S}~ [ c_{i, 2}(x)] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {g}}(x)]+\mathbb {E}_{x\in S}~ [ (1-c_{i, 2}(x))] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {data}}(x)]+\text {Cov}(c_{i, 2}, p_{\text {g}}-p_{\text {data}})\\ <\,&\mathbb {E}_{x\in S}~ [ c_{i, 2}(x)] \cdot \epsilon +\mathbb {E}_{x\in S}~ [ (1-c_{i, 2}(x))] \cdot \mathbb {E}_{x\in S}~ [ p_{\text {data}}(x)]+\text {Cov}(c_{i, 2}, p_{\text {g}}-p_{\text {data}}). \end{aligned}$$

1.4 B.4 Proof of Theorem 3

Proof

Ignoring the weight \(\beta \), the duel term of discriminator \(D_i\) w.r.t. its diverged peer discriminator \(\widetilde{D}_{j}\) becomes:

$$\begin{aligned}&\quad \text {Duel}(D_i)|_{\widetilde{D}_{j}}:= \mathbb E_{x\thicksim p_{\text {duel}}}\Big [\ell \Big (D_i(x),\mathbbm {1}\big (\widetilde{D}_{j}(x)>\dfrac{1}{2}\big )\Big )-\alpha \cdot \ell \Big (D_i(x_{p_1}),\mathbbm {1}\big (\widetilde{D}_{j}(x_{p_2})>\dfrac{1}{2}\big )\Big )\Big ]\\&=\mathbb E_{x\thicksim p_{\text {duel}},Y^*_j=1}\Big [\mathbb {P}(\widetilde{Y}_j=1|Y^*_j=1)\cdot \ell \big (D_{i}(x),1\big )+\mathbb {P}(\widetilde{Y}_j=0|Y^*_j=1)\cdot \ell \big (D_{i}(x),0\big )\Big ] \\&+\mathbb E_{x\thicksim p_{\text {duel}},Y^*_j=0}\Big [\mathbb {P}(\widetilde{Y}_j=1|Y^*_j=0)\cdot \ell \big (D_{i}(x),1\big )+\mathbb {P}(\widetilde{Y}_j=0|Y^*_j=0)\cdot \ell \big (D_{i}(x),0\big )\Big ]\\&-\alpha \cdot \mathbb E_{x_{p_1}\thicksim p_{\text {duel}}}\Big [\mathbb {P}(\widetilde{Y}_{j}=1)\cdot \ell \big (D_{i}(x_{p_1}),1\big )+\mathbb {P}(\widetilde{Y}_{j}=0)\cdot \ell \big (D_{i}(x_{p_1}),0\big )\Big ]\\&=\mathbb E_{x\thicksim p_{\text {duel}},Y^*_j=1}\Big [(1-e_{\text {data}, j})\cdot \ell (D_{i}(x),1)+e_{\text {data}, j}\cdot \ell (D_{i}(x),0)\Big ] \\&+\mathbb E_{x\thicksim p_{\text {duel}},Y^*_j=0}\Big [e_{g,j}\cdot \ell (D_{i}(x),1)+(1-e_{g,j})\cdot \ell (D_{i}(x),0)\Big ]\\&-\alpha \cdot \mathbb E_{x_{p_1}\thicksim p_{\text {duel}}}\Big [\big [\mathbb {P}(Y^*_j=1)\cdot (1-e_{\text {data},j})+\mathbb {P}(Y^*_j=0)\cdot e_{g,j}\big ]\cdot \ell \big (D_{i}(x_{p_1}),1\big )\Big ]\\&-\alpha \cdot \mathbb E_{x_{p_1}\thicksim p_{\text {duel}}}\Big [\big [\mathbb {P}(Y^*_j=1)\cdot e_{\text {data},j}+\mathbb {P}(Y^*_j=0)\cdot (1-e_{g,j})\big ]\cdot \ell \big (D_{i}(x_{p_1}),0\big )\Big ]\\&=\mathbb E_{x\thicksim p_{\text {duel}},Y_j^*=1}\Big [(1-e_{\text {data}, j}-e_{g, j})\cdot \ell (D_{i}(x),1)+e_{\text {data}, j}\cdot \ell (D_{i}(x),0)+e_{g, j}\cdot \ell (D_{i}(x),1)\Big ] \\&+\mathbb E_{x\thicksim p_{\text {duel}},Y_j^*=0}\Big [(1-e_{\text {data}, j}-e_{g,j})\cdot \ell (D_{i}(x),0)+e_{data,j}\cdot \ell (D_{i}(x),0)+e_{g,j}\cdot \ell (D_{i}(x),1)\Big ]\\&-\alpha \cdot \mathbb E_{x_{p_1}\thicksim p_{\text {duel}}}\Big [c_1\cdot \ell \big (D_{i}(x_{p_1}),1\big )\Big ] -\alpha \cdot \mathbb E_{x_{p_1}\thicksim p_{\text {duel}}}\Big [c_2\cdot \ell \big (D_{i}(x_{p_1}),0\big )\Big ], \end{aligned}$$

where we define:

$$\begin{aligned}&c_1:=\mathbb {P}(Y^*_j=1)\cdot (1-e_{\text {data},j}-e_{g,j})+\mathbb {P}(Y^*_j=0)\cdot e_{g,j}+\mathbb {P}(Y^*_j=1)\cdot e_{g,j},\\&c_2:=\mathbb {P}(Y^*_j=0)\cdot (1-e_{\text {data},j}-e_{g,j})+\mathbb {P}(Y^*_j=1)\cdot e_{\text {data},j}+\mathbb {P}(Y^*_j=0)\cdot e_{\text {data},j}, \end{aligned}$$

for a clear presentation. Proceeding the previous deduction, we then have:

$$\begin{aligned} \text {Duel}(D_i)|_{\widetilde{D}_{j}}&=(1-e_{\text {data}, j}-e_{g, j})\cdot \mathbb E_{x\thicksim p_{\text {duel}}}\Big [ \ell \big (D_{i}(x),Y^*_j\big )\Big ]\nonumber \\&+\mathbb E_{x\thicksim p_{\text {duel}}}\Big [e_{\text {data}, j}\cdot \ell \big (D_{i}(x),0\big )+e_{g, j}\cdot \ell \big (D_{i}(x),1\big )\Big ]\nonumber \\&-\alpha \cdot (1-e_{\text {data},j}-e_{g,j})\cdot \mathbb E_{x\thicksim p_{\text {duel}}}\Big [ \ell \big (D_{i}(x_{p_1}),Y^*_j\big )\Big ]\nonumber \\&-\alpha \cdot \mathbb E_{x\thicksim p_{\text {duel}}}\Big [e_{\text {data}, j}\cdot \ell \big (D_{i}(x),0\big )+e_{g, j}\cdot \ell \big (D_{i}(x),1\big )\Big ]. \end{aligned}$$
(29)

Thus,

$$\begin{aligned} \text {Duel}(D_i)|_{\widetilde{D}_{j}}=&(1-e_{\text {data}, j}-e_{g, j})\cdot \text {Duel}(D_i)|_{D^*_{j,G}}\nonumber \\ +&\underbrace{(1-\alpha )\cdot \mathbb E_{x\thicksim p_{\text {duel}}} \big [e_{\text {data}, j}\cdot \ell \big (D_{i}(x),0\big )+e_{g, j}\cdot \ell \big (D_{i}(x),1\big )\big ]}_{{\textbf {Bias}}}. \end{aligned}$$
(30)

Note that:

$$\begin{aligned} {\textbf {Bias}}=(1-\alpha )\cdot \mathbb E_{x\thicksim p_{\text {duel}}} \big [e_{\text {data}, j}\cdot \log \big (1-D_{i}(x)\big )+e_{g, j}\cdot \log \big (D_{i}(x)\big )\big ]. \end{aligned}$$
(31)

Thus, given \(\alpha =1\), the Bias term is cancelled out. When \(e_{\text {data}, j}+e_{g, j}<1\), we have:

$$\begin{aligned} \text {Duel}(D_i)|_{\widetilde{D}_{j}}=&(1-e_{\text {data}, j}-e_{g, j})\cdot \text {Duel}(D_i)|_{D^*_{j,G}}, \end{aligned}$$
(32)

and we further have:

$$\begin{aligned} \max _{D_{i}} \text {Duel}(D_i)|_{\widetilde{D}_{j}} =&\max _{D_{i}} \text {Duel}(D_i)|_{D^*_{j,G}}. \end{aligned}$$
(33)

1.5 B.5 Proof of Theorem 4

Proof

When \(\beta =0\), the overall min-max game becomes:

$$\begin{aligned}&\min _{G}\max _{D_1,D_2}\mathcal {L}(D_1,D_2, G)\nonumber \\ =&\min _{G} \max _{D_1,D_2} \mathbb E_{x\thicksim p_{\text {data}}}\big [\log D_1(x)\big ]+\mathbb E_{z\thicksim p_{z}}\Big [\log \Big (1-D_1\big (G(z)\big )\Big )\Big ]\nonumber \\&\qquad \qquad +\mathbb E_{x\thicksim p_{\text {data}}}\big [\log D_2(x)\big ]+\mathbb E_{z\thicksim p_{z}}\Big [\log \Big (1-D_2\big (G(z)\big )\Big )\Big ]. \end{aligned}$$
(34)

Since we assume enough capacity, the inner max game is achieved if and only if: \(D_1(x)=D_2(x)=\frac{p_{\text {data}}(x)}{p_{\text {data}}(x)+p_g(x)}\). To prove \(p_g\) converges to \(p_{\text {data}}\), only need to reproduce the proof of proposition 2 in [16]. We omit the details here.

C Experiment Details and Additional Results

Model Architectures. For the small-scale datasets, we used a shallow version of generator and discriminator: three convolution layers in the generator and four layers in the discriminators. We use a deep version of generator and discriminator for natural scene and human face image generation, which have three convolution layers in the generator and seven layers in the discriminators. The deep version is the original design of DCGAN [45]. The peer discriminator uses the duplicate version of the first one.

1.1 C.1 Architecture Comparison Between GAN, D2GAN and DuelGAN

Fig. 5 shows the architecture designs of single discriminator, dual discriminator, and our proposed DuelGAN. Compared with Vanilla GAN, DuelGAN has one more identical discriminator and a competitive Duel Game between two discriminators. The introduced Duel Game induces diversified generated samples by discouraging the agreement between \(D_1\) and \(D_2\). In D2GAN, although both discriminators are trained with different loss functions, they do not interfere with each other in the training.

Fig. 5.
figure 5

Architecture comparisons between GAN based method (first row), dual discriminators GAN based method (second row) and DuelGAN (third row).

1.2 C.2 Additional Experiment Results

StyleGAN-ADA [24] is the state-of-the-art method in image generation. We applied our duel game to StyleGAN-ADA and further improves its performance. On CelebA [34] dataset, we improved FID from 4.85 to 4.52, and FFHQ-10k [25] dataset improved FID from 7.24 to 6.01. We show the generated image results (trained on CelebA) in Fig. 6.

Fig. 6.
figure 6

More CelebA image generation results of applying duel game on StyleGAN-ADA.

1.3 C.3 Additional Experiment Details

Model Architectures. For the small-scale datasets, we used a shallow version of generator and discriminator: three convolution layers in the generator and four layers in the discriminators. We use a deep version of generator and discriminator for natural scene and human face image generation, which have three convolution layers in the generator and seven layers in the discriminators. The deep version is the original design of DCGAN [45]. The peer discriminator uses the duplicate version of the first one.

Hyper-parameters. DuelGAN achieves low FID scores when \(\alpha \) and \(\beta \) are simply set to constant values. However we found that we could obtain an approximately 10% improvement through dynamic tuning. The parameter \(\beta \) controls the overall weight of \(\text {Duel-D}\), while \(\alpha \) punishes the condition when \(D_1\) over-agrees with \(D_2\). In the early training phase when we have an unstable generator and discriminator, we set \(\alpha \) and \(\beta \) to 0. As training progresses, we gradually increase these parameters to a max value, which helps with vanishing gradients. After the midpoint of training we decrease these parameters to help the discriminators converge, until the parameters reach approximately 0 at the end of the training process. We adopt 0.3, 0.5 as the max value for \(\alpha \) and \(\beta \), respectively.

1.4 C.4 Ablation Study of DuelGAN

During training, We initialize the \(\alpha \) and \(\beta \) as 0, and gradually increase to the set maximum value. We experimentally discover \(\alpha \) = 0.3 and \(\beta \) = 0.5 can achieve the best FID score in the datasets we tested on. Table 3 shows an thorough ablation of different hyper-parameter setting on STL-10 dataset. The bold text are the best \(\alpha \) setting when beta is fixed.

Fig. 7.
figure 7

The trend of \(\alpha , \beta \) in the training.

Table 4. Ablation study of max \(\alpha \) and max \(\beta \) value tuning on STL-10 dataset (evaluate with FID score).
Fig. 8.
figure 8

The loss of \(D_1\) in DuelGAN with \(\beta =0.25\) on STL-10 dataset, left: \(\alpha =0.3\); middle: 0.5; right: \(\alpha =0.7\).

Fig. 9.
figure 9

The loss of \(D_2\) in DuelGAN with \(\beta =0.25\) on STL-10 dataset, left: \(\alpha =0.3\); middle: 0.5; right: \(\alpha =0.7\).

1.5 C.5 Stability of Training

Now we empirically show the stability of DuelGAN training procedure. We adopt STL-10 dataset and \(\beta =0.25\) for illustration. In Fig. 8 and 9, we visualize the loss of two discriminators during the training procedure of STL-10 dataset. The red lines indicate the smoothed trend of the loss evaluated on the generated images and real images. Real losses are represented by the shaded red lines. Although there exists certain unstable episodes (the difference between smoothed loss and the real loss is large) for both discriminators, the overall trend of both discriminators are stable. What is more, we do observe that \(D_1\) and \(D_2\) hardly experience unstable episodes at the same time. This phenomenon further validates our conclusion in Theorem 2: an unstable/diverged discriminator hardly disrupts the training of its peer discriminator!

Agreements Between Two Discriminators. We also empirically estimate the agreement level between two discriminators while training. In Fig. 10, the \(y-\)axis denotes the percentage of predictions that reach a consensus by \(D_1\) and \(D_2\). The smoothed curve depicts the overall change of the agreement level. At the initial stage, \(D_i\) is not encouraged to agree overly on its peer discriminator \(D_j\). As the training progresses, the agreement level gradually increases to a high value to help the convergence. The shaded red line means that the practical agreement level fluctuates around the smoothed line, incurs a certain degree of randomness and prevents discriminators from getting stuck in a local optimum.

Fig. 10.
figure 10

The agreement level between \(D_1\) and \(D_2\) in DuelGAN with \(\beta =0.25\) on STL-10 dataset, left: \(\alpha =0.3\); middle: 0.5; right: \(\alpha =0.7\).

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, J., Liu, M., Luo, J., Zhu, A., Davis, J., Liu, Y. (2022). DuelGAN: A Duel Between Two Discriminators Stabilizes the GAN Training. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20050-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20049-6

  • Online ISBN: 978-3-031-20050-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics