Skip to main content

Capabilities, Limitations and Challenges of Style Transfer with CycleGANs: A Study on Automatic Ring Design Generation

  • Conference paper
  • First Online:
Machine Learning and Knowledge Extraction (CD-MAKE 2022)

Abstract

Rendering programs have changed the design process completely as they permit to see how the products will look before they are fabricated. However, the rendering process is complicated and takes a significant amount of time, not only in the rendering itself but in the setting of the scene as well. Materials, lights and cameras need to be set in order to get the best quality results. Nevertheless, the optimal output may not be obtained in the first render. This all makes the rendering process a tedious process. Since Goodfellow et al. introduced Generative Adversarial Networks (GANs) in 2014 [1], they have been used to generate computer-assigned synthetic data, from non-existing human faces to medical data analysis or image style transfer. GANs have been used to transfer image textures from one domain to another. However, paired data from both domains was needed. When Zhu et al. introduced the CycleGAN model, the elimination of this expensive constraint permitted transforming one image from one domain into another, without the need for paired data. This work validates the applicability of CycleGANs on style transfer from an initial sketch to a final render in 2D that represents a 3D design, a step that is paramount in every product design process. We inquiry the possibilities of including CycleGANs as part of the design pipeline, more precisely, applied to the rendering of ring designs. Our contribution entails a crucial part of the process as it allows the customer to see the final product before buying. This work sets a basis for future research, showing the possibilities of GANs in design and establishing a starting point for novel applications to approach crafts design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Arca will use AI to soundtrack NYC’s Museum of Modern Art, https://www.engadget.com/2019-10-17-arca-ai-soundtrack-for-nyc-moma.html.

  2. 2.

    Untitled Computer Drawing, by Harold Cohen, 1982, Tate. (n.d.), https://www.tate.org.uk/art/artworks/cohen-untitled-computer-drawing-t04167.

  3. 3.

    Design Thinking, https://hbr.org/2008/06/design-thinking.

  4. 4.

    Why IKEA Uses 3D Renders vs. Photography for Their Furniture Catalog, https://www.cadcrowd.com/blog/why-ikea-uses-3d-renders-vs-photography-for-their-furniture-catalog.

  5. 5.

    The XYU ring project is key to understanding this work. More information in https://tomascabezon.com/. XYU is the name of this project, it is not an acronym, but the name of this ring composed of 3 randomly chosen letters.

  6. 6.

    https://github.com/tcabezon/automatic-ring-design-generation-cycleGAN, https://tcabezon.github.io/automatic-ring-design-generation-cycleGAN/.

References

  1. Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 1–9 (2014)

    Google Scholar 

  2. Thies, J., Zollhöfer, M., Theobalt, C., Stamminger, M., Nießner, M.: Ignor: Image-guided neural object rendering. arXiv preprint arXiv:1811.10720 (2018)

  3. Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhofer, M.: Deepvoxels: learning persistent 3d feature embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2437–2446 (2019)

    Google Scholar 

  4. Dupont, E., et al.: Equivariant neural rendering. arXiv preprint arXiv:2006.07630 (2020)

  5. Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)

  6. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  7. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2016)

    Google Scholar 

  8. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8798–8807 (2018)

    Google Scholar 

  9. Wang, C., Zheng, H., Yu, Z., Zheng, Z., Gu, Z., Zheng, B.: Discriminative region proposal adversarial networks for high-quality image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), September 2018

    Google Scholar 

  10. Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

    Google Scholar 

  11. Schönfeld, E., Sushko, V., Zhang, D., Gall, J., Schiele, B., Khoreva, A.: You only need adversarial supervision for semantic image synthesis. In: International Conference on Learning Representations (2021)

    Google Scholar 

  12. Lütjens, B., et al. Physically-consistent generative adversarial networks for coastal flood visualization. arXiv preprint arXiv:2104.04785 (2021)

  13. Lütjens, B., et al.: Physics-informed GANs for coastal flood visualization. arXiv preprint arXiv:2010.08103 (2020)

  14. Casale, F.P., Dalca, A., Saglietti, L., Listgarten, J., Fusi, N.: Gaussian process prior variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 10369–10380 (2018)

    Google Scholar 

  15. Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Bach, F., Blei, D., (eds.) Proceedings of the 32nd International Conference on Machine Learning (ICML), vol. 37, pp. 1530–1538 (2015)

    Google Scholar 

  16. Lugmayr, A., Danelljan, M., Van Gool, L., Timofte, R.: Learning the super-resolution space with normalizing flow. In: ECCV, Srflow (2020)

    Google Scholar 

  17. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  18. Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems, vol. 29, pp. 658–666. Curran Associates Inc (2016)

    Google Scholar 

  19. Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 30, pp. 465–476 (2017)

    Google Scholar 

  20. Pedroso, T.C.: Utilización de métodos aleatorios en la generación de formas geométricas. Master’s thesis, Universidad Politécnica de Madrid, Spain (2003). https://oa.upm.es/69208/

  21. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  22. Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill 1(10), e3 (2016)

    Article  Google Scholar 

  23. Murasugi, K., Kurpita, B.: Knot Theory and Its Applications. Springer, Boston (1996). https://doi.org/10.1007/978-0-8176-4719-3

    Book  MATH  Google Scholar 

Download references

Acknowledgments

Díaz-Rodríguez is supported by IJC2019-039152-I funded by MCIN/AEI/10.13039 /501100011033 by “ESF Investing in your future” and Google Research Scholar Program. Del Ser is funded by the Basque Government ELKARTEK program (3KIA project, KK-2020/00049) and research group MATHMODE (T1294-19).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Díaz-Rodríguez .

Editor information

Editors and Affiliations

A Appendix: Supplementary Materials

A Appendix: Supplementary Materials

1.1 A.1 Datasets

Some randomly selected .jpg images from the different datasets generated for this work are shown in this section. The aim is to show the diversity of images that have been used for the purpose of training the CycleGAN.

Sketch2Rendering: 179 sketch images and 176 rendered images of the training dataset were used for the training. The images were scaled to 400\(\,\times \,\)400 pixels when loaded. The Sketch dataset is composed of .jpg images. These, in Fig. 14, have been generated using Matlab and the XYU ring algorithm. These images are a 3D plot of the splines that compose each of the rings, all with the same line thickness. The thickness was varied to show different ring thicknesses.

Rendered Dataset: The images in Fig. 14 have been created using the Blender rendering software. As it can be seen, although the background color has always been the same blue (#B9E2EA), the lighting setting has changed, as well as the camera position and orientation, and thus, different shadows and lights can be appreciated across the dataset.

Fig. 14.
figure 14

On the left, random images of wire sketches created in Matlab language and used to train the CycleGAN model (domain A). On the right, random images of the rendered set of images, created using Blender rendering software, and used to train the CycleGAN model (domain B).

1.2 A.2 CycleGAN Model

To achieve a cycle consistency among two domains, a CycleGAN requires two generators: the fist generator (\(G_{AB}\)) will translate from domain A to B, and the second generator (\(G_{BA}\)) will translate from domain B back to A. Therefore, there will be two losses, one forward cycle consistency loss and other backward cycle consistency loss. These mean that \(x*=G_{AB}(G_{BA}(x))\) and \(y*=G_{BA}(G_{AB}(y) )\).

CycleGAN Generator Architecture: The generator in the CycleGAN has layers that implement three stages of computation:

  1. 1.

    The first stage encodes the input via a series of convolutional layers that extract image features.

  2. 2.

    the second stage then transforms the features by passing them through one or more residual blocks.

  3. 3.

    The third stage decodes the transformed features using a series of transposed convolutional layers, to build an output image of the same size as the input.

The residual block used in transformation stage 2 consists of a convolutional layer, where the input is added to the output of the convolution. This is done so that the characteristics of the output image (e.g., the shapes of objects) do not differ too much from the input. Figure 16 shows the proposed architecture with example paired images as input.

CycleGAN Discriminator Architecture: The discriminator of the CycleGANs is based in the PatchGAN architecture [6]. The difference between this architecture and the usual GAN’s discriminators is that the CycleGAN discriminator, instead of having a single float as an output, it outputs a matrix of values. A PatchGAN architecture will output a matrix of values, each of them between 0 (fake) and 1 (real), classifying the corresponding portions of the image (Fig. 15).

Fig. 15.
figure 15

Generator (upper) and Discriminator (below) architectures. Example classifying a portion of the image in the PatchGAN architecture, part of the CycleGAN discriminator. In this example 0.8 is the score the discriminator gave to that patch of the image (i.e., this patch looks closer to a real image (1)).

Fig. 16.
figure 16

Proposed CycleGAN model to learn an unsupervised Sketch2Rendering mapping.

Losses: The objective of CycleGANs is to learn the mapping between domains X and Y given training examples \(x_i\in X\) and \(y_i\in Y\). The data distributions are \(x \sim p_{data}(x)\) and \(y\sim p_{data}(y)\). As shown in Fig. 16, the model includes two mappings, one learned by each generator, \(G_{AB} : X \rightarrow Y \) and \(G_{BA} : Y \rightarrow X \).

Apart from these generators, the model has two discriminators, one for each domain. \(D_X\) will learn to distinguish between real images x and fake images \(x*=G_{BA}(y)\), while discriminator \(D_B\) will learn to distinguish between real images y and fake images \(y*=G_{AB}(x)\). The objective functions will therefore contain two different losses, the adversarial losses [1] that will measure whether the distribution of the generated images match the data distribution in the target domain, and the cycle consistency losses [21], that will make sure that \(G_{AB}\) and \(G_{BA}\) do not contradict each other.

Cycle Consistency Loss: It can be expressed as \(||x-x^*||\) or \(||y-y^*||\), depending on which of the styles we consider as the starting point, where \(x^*\) and \(y^*\) represent the fake images generated by the generators. These equations ensure that the original image and the output image, after completing the cycle, i.e., the twice-translated image, are the same. This loss function is expressed as:

$$\begin{aligned} \mathcal {L}_{cyc}(G_{AB},G_{BA})=&\mathbb {E}_{x\sim p_{data}(x)}[||G_{BA}(G_{AB}(x))-x||_1] \nonumber \\&+\mathbb {E}_{y\sim p_{data}(y)}[||G_{AB}(G_{BA}(y))-y||_1] \end{aligned}$$
(1)

Adversarial Loss: Apart from the cycle consistency loss, CycleGANs also use adversarial loss to train. As in traditional GAN models, the adversarial loss measures whether the generated images look real, i.e., whether they are indistinguishable from the ones coming from the same probability distribution learned from the training set [1]. For the mapping \(G_{AB} : X \rightarrow Y \) and the corresponding discriminator, we express the objective as:

$$\begin{aligned} \mathcal {L}_{GAN}(G_{AB},D_B,X,Y)=&\mathbb {E}_{x\sim p_{data}(x)}[log D_B(y)] \nonumber \\&+ \mathbb {E}_{x\sim p_{data}(x)}[log(1- D_B(G_{AB}(x))] \end{aligned}$$
(2)

Every translation by the \(G_{AB}\) generator will be checked by the \(D_{B}\) discriminator, and the output of generator \(G_{BA}\) will be assessed and controlled by the \(D_{A}\) discriminator. Every time we translate from one domain to another, the discriminator will test if the output of the generator looks real or fake. Each generator will try to fool its adversary, the discriminator. While each generator tries to minimize the objective function, the corresponding discriminator tries to maximize it. The training objectives of this loss are \(\min _{G_{AB}}\max _{D_B}\mathcal {L}_{GAN}(G_{AB},D_B,X,Y)\) and \(\min _{G_{BA}}\max _{D_A}\mathcal {L}_{GAN}(G_{BA},D_A,X,Y)\).

Identity Loss: The identity loss measures if the output of the CycleGAN preserves the overall color temperature or structure of the picture. Pixel distance is used to ensure that ideally there is no difference between the output and the input, this ensures that the CycleGAN only changes the parts of the image when it needs to.

Model Training: The full objective of the CycleGAN is reducing these three loss functions. Actually, Zhu et al. show that training the networks with only one of the functions doesn’t arrive to high-quality results. In the formula, we can see that both the identity loss and cycle consistency functions are weighted by \(\lambda _{ident}\) and \(\lambda _{cyc}\), respectively. These scalars control the importance of each of the losses in the training. In our case, following the values for these parameters proposed in the original paper [21], \(\lambda _{cyc}\) will be 10, and \(\lambda _{ident}\) will be 0.1, as this last function only controls the tint of the background of the input and output images; and as our dataset is composed of the same colors, it does not suppose a large influence.

$$\begin{aligned} \mathcal {L}(G_{AB},G_{BA},D_A,D_B) =&\mathcal {L}_{GAN}(G_{AB},D_B,X,Y)+\mathcal {L}_{GAN}(G_{BA},D_A,X,Y) \nonumber \\&+\lambda _{cyc}\mathcal {L}_{cyc}(G_{AB},G_{BA}) +\lambda _{ident}\mathcal {L}_ {ident}(G_{AB},G_{BA}) \end{aligned}$$
(3)

1.3 A.3 CycleGAN Training details

The networks were trained from scratch with a starting learning rate of 0.0002 for 100 epochs, after this, it was trained for 100 epochs more with a learning rate of 0.00002, as suggested by Zhu et al. in [21]. Following this procedure, the objective loss function of the discriminator D was divided by 2, which slows down the rate at which D learns compared with the generator G.

For the generator and discriminator we adopt the same architectures as the ones proposed by Zhu et al. [21], with the difference that for the first and last layers in the generator, we used a padding of 3 due to the input image size of our dataset.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cabezon Pedroso, T., Ser, J.D., Díaz-Rodríguez, N. (2022). Capabilities, Limitations and Challenges of Style Transfer with CycleGANs: A Study on Automatic Ring Design Generation. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14463-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14462-2

  • Online ISBN: 978-3-031-14463-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics