Capabilities, Limitations and Challenges of Style Transfer with CycleGANs: A Study on Automatic Ring Design Generation

Cabezon Pedroso, Tomas; Ser, Javier Del; Díaz-Rodríguez, Natalia

doi:10.1007/978-3-031-14463-9_11

Tomas Cabezon Pedroso¹¹,
Javier Del Ser^12,13 &
Natalia Díaz-Rodríguez¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13480))

Included in the following conference series:

International Cross-Domain Conference for Machine Learning and Knowledge Extraction

1247 Accesses
2 Citations
2 Altmetric

Abstract

Rendering programs have changed the design process completely as they permit to see how the products will look before they are fabricated. However, the rendering process is complicated and takes a significant amount of time, not only in the rendering itself but in the setting of the scene as well. Materials, lights and cameras need to be set in order to get the best quality results. Nevertheless, the optimal output may not be obtained in the first render. This all makes the rendering process a tedious process. Since Goodfellow et al. introduced Generative Adversarial Networks (GANs) in 2014 [1], they have been used to generate computer-assigned synthetic data, from non-existing human faces to medical data analysis or image style transfer. GANs have been used to transfer image textures from one domain to another. However, paired data from both domains was needed. When Zhu et al. introduced the CycleGAN model, the elimination of this expensive constraint permitted transforming one image from one domain into another, without the need for paired data. This work validates the applicability of CycleGANs on style transfer from an initial sketch to a final render in 2D that represents a 3D design, a step that is paramount in every product design process. We inquiry the possibilities of including CycleGANs as part of the design pipeline, more precisely, applied to the rendering of ring designs. Our contribution entails a crucial part of the process as it allows the customer to see the final product before buying. This work sets a basis for future research, showing the possibilities of GANs in design and establishing a starting point for novel applications to approach crafts design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ArcGAN: “Generative Adversarial Networks for 3D Architectural Image Generation”

SemanticStyleGAN: Generative Image Inpainting Using Style-Based Generator

Generative Creativity: Adversarial Learning for Bionic Design

Notes

1.
Arca will use AI to soundtrack NYC’s Museum of Modern Art, https://www.engadget.com/2019-10-17-arca-ai-soundtrack-for-nyc-moma.html.
2.
Untitled Computer Drawing, by Harold Cohen, 1982, Tate. (n.d.), https://www.tate.org.uk/art/artworks/cohen-untitled-computer-drawing-t04167.
3.
Design Thinking, https://hbr.org/2008/06/design-thinking.
4.
Why IKEA Uses 3D Renders vs. Photography for Their Furniture Catalog, https://www.cadcrowd.com/blog/why-ikea-uses-3d-renders-vs-photography-for-their-furniture-catalog.
5.
The XYU ring project is key to understanding this work. More information in https://tomascabezon.com/. XYU is the name of this project, it is not an acronym, but the name of this ring composed of 3 randomly chosen letters.
6.
https://github.com/tcabezon/automatic-ring-design-generation-cycleGAN, https://tcabezon.github.io/automatic-ring-design-generation-cycleGAN/.

References

Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 1–9 (2014)
Google Scholar
Thies, J., Zollhöfer, M., Theobalt, C., Stamminger, M., Nießner, M.: Ignor: Image-guided neural object rendering. arXiv preprint arXiv:1811.10720 (2018)
Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhofer, M.: Deepvoxels: learning persistent 3d feature embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2437–2446 (2019)
Google Scholar
Dupont, E., et al.: Equivariant neural rendering. arXiv preprint arXiv:2006.07630 (2020)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2016)
Google Scholar
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8798–8807 (2018)
Google Scholar
Wang, C., Zheng, H., Yu, Z., Zheng, Z., Gu, Z., Zheng, B.: Discriminative region proposal adversarial networks for high-quality image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), September 2018
Google Scholar
Park, T., Liu, M.-Y., Wang, T.-C., Zhu, J.-Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Schönfeld, E., Sushko, V., Zhang, D., Gall, J., Schiele, B., Khoreva, A.: You only need adversarial supervision for semantic image synthesis. In: International Conference on Learning Representations (2021)
Google Scholar
Lütjens, B., et al. Physically-consistent generative adversarial networks for coastal flood visualization. arXiv preprint arXiv:2104.04785 (2021)
Lütjens, B., et al.: Physics-informed GANs for coastal flood visualization. arXiv preprint arXiv:2010.08103 (2020)
Casale, F.P., Dalca, A., Saglietti, L., Listgarten, J., Fusi, N.: Gaussian process prior variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 10369–10380 (2018)
Google Scholar
Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Bach, F., Blei, D., (eds.) Proceedings of the 32nd International Conference on Machine Learning (ICML), vol. 37, pp. 1530–1538 (2015)
Google Scholar
Lugmayr, A., Danelljan, M., Van Gool, L., Timofte, R.: Learning the super-resolution space with normalizing flow. In: ECCV, Srflow (2020)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems, vol. 29, pp. 658–666. Curran Associates Inc (2016)
Google Scholar
Zhu, J.-Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 30, pp. 465–476 (2017)
Google Scholar
Pedroso, T.C.: Utilización de métodos aleatorios en la generación de formas geométricas. Master’s thesis, Universidad Politécnica de Madrid, Spain (2003). https://oa.upm.es/69208/
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Google Scholar
Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill 1(10), e3 (2016)
Article Google Scholar
Murasugi, K., Kurpita, B.: Knot Theory and Its Applications. Springer, Boston (1996). https://doi.org/10.1007/978-0-8176-4719-3
Book MATH Google Scholar

Download references

Acknowledgments

Díaz-Rodríguez is supported by IJC2019-039152-I funded by MCIN/AEI/10.13039 /501100011033 by “ESF Investing in your future” and Google Research Scholar Program. Del Ser is funded by the Basque Government ELKARTEK program (3KIA project, KK-2020/00049) and research group MATHMODE (T1294-19).

Author information

Authors and Affiliations

Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Tomas Cabezon Pedroso
TECNALIA, Basque Research and Technology Alliance (BRTA), 48160, Derio, Spain
Javier Del Ser
University of the Basque Country (UPV/EHU), 48013, Bilbao, Spain
Javier Del Ser
Department of Computer Sciences and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), CITIC, University of Granada, Granada, Spain
Natalia Díaz-Rodríguez

Authors

Tomas Cabezon Pedroso
View author publications
You can also search for this author in PubMed Google Scholar
Javier Del Ser
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Díaz-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Díaz-Rodríguez .

Editor information

Editors and Affiliations

University of Natural Resources and Life Sciences Vienna, Vienna, Austria
Andreas Holzinger
St. Pölten University of Applied Sciences, St. Pölten, Austria
Peter Kieseberg
TU Wien, Vienna, Austria
A Min Tjoa
SBA Research, Vienna, Austria
Edgar Weippl

A Appendix: Supplementary Materials

1.1 A.1 Datasets

Some randomly selected .jpg images from the different datasets generated for this work are shown in this section. The aim is to show the diversity of images that have been used for the purpose of training the CycleGAN.

Sketch2Rendering: 179 sketch images and 176 rendered images of the training dataset were used for the training. The images were scaled to 400$\,\times \,$400 pixels when loaded. The Sketch dataset is composed of .jpg images. These, in Fig. 14, have been generated using Matlab and the XYU ring algorithm. These images are a 3D plot of the splines that compose each of the rings, all with the same line thickness. The thickness was varied to show different ring thicknesses.

Rendered Dataset: The images in Fig. 14 have been created using the Blender rendering software. As it can be seen, although the background color has always been the same blue (#B9E2EA), the lighting setting has changed, as well as the camera position and orientation, and thus, different shadows and lights can be appreciated across the dataset.

1.2 A.2 CycleGAN Model

To achieve a cycle consistency among two domains, a CycleGAN requires two generators: the fist generator ($G_{AB}$) will translate from domain A to B, and the second generator ($G_{BA}$) will translate from domain B back to A. Therefore, there will be two losses, one forward cycle consistency loss and other backward cycle consistency loss. These mean that $x*=G_{AB}(G_{BA}(x))$ and $y*=G_{BA}(G_{AB}(y) )$.

CycleGAN Generator Architecture: The generator in the CycleGAN has layers that implement three stages of computation:

1.
The first stage encodes the input via a series of convolutional layers that extract image features.
2.
the second stage then transforms the features by passing them through one or more residual blocks.
3.
The third stage decodes the transformed features using a series of transposed convolutional layers, to build an output image of the same size as the input.

The residual block used in transformation stage 2 consists of a convolutional layer, where the input is added to the output of the convolution. This is done so that the characteristics of the output image (e.g., the shapes of objects) do not differ too much from the input. Figure 16 shows the proposed architecture with example paired images as input.

CycleGAN Discriminator Architecture: The discriminator of the CycleGANs is based in the PatchGAN architecture [6]. The difference between this architecture and the usual GAN’s discriminators is that the CycleGAN discriminator, instead of having a single float as an output, it outputs a matrix of values. A PatchGAN architecture will output a matrix of values, each of them between 0 (fake) and 1 (real), classifying the corresponding portions of the image (Fig. 15).

Losses: The objective of CycleGANs is to learn the mapping between domains X and Y given training examples $x_i\in X$ and $y_i\in Y$. The data distributions are $x \sim p_{data}(x)$ and $y\sim p_{data}(y)$. As shown in Fig. 16, the model includes two mappings, one learned by each generator, $G_{AB} : X \rightarrow Y $ and $G_{BA} : Y \rightarrow X $.

Apart from these generators, the model has two discriminators, one for each domain. $D_X$ will learn to distinguish between real images x and fake images $x*=G_{BA}(y)$, while discriminator $D_B$ will learn to distinguish between real images y and fake images $y*=G_{AB}(x)$. The objective functions will therefore contain two different losses, the adversarial losses [1] that will measure whether the distribution of the generated images match the data distribution in the target domain, and the cycle consistency losses [21], that will make sure that $G_{AB}$ and $G_{BA}$ do not contradict each other.

Cycle Consistency Loss: It can be expressed as $||x-x^*||$ or $||y-y^*||$, depending on which of the styles we consider as the starting point, where $x^*$ and $y^*$ represent the fake images generated by the generators. These equations ensure that the original image and the output image, after completing the cycle, i.e., the twice-translated image, are the same. This loss function is expressed as:

$$\begin{aligned} \mathcal {L}_{cyc}(G_{AB},G_{BA})=&\mathbb {E}_{x\sim p_{data}(x)}[||G_{BA}(G_{AB}(x))-x||_1] \nonumber \\&+\mathbb {E}_{y\sim p_{data}(y)}[||G_{AB}(G_{BA}(y))-y||_1] \end{aligned}$$

(1)

Adversarial Loss: Apart from the cycle consistency loss, CycleGANs also use adversarial loss to train. As in traditional GAN models, the adversarial loss measures whether the generated images look real, i.e., whether they are indistinguishable from the ones coming from the same probability distribution learned from the training set [1]. For the mapping $G_{AB} : X \rightarrow Y $ and the corresponding discriminator, we express the objective as:

$$\begin{aligned} \mathcal {L}_{GAN}(G_{AB},D_B,X,Y)=&\mathbb {E}_{x\sim p_{data}(x)}[log D_B(y)] \nonumber \\&+ \mathbb {E}_{x\sim p_{data}(x)}[log(1- D_B(G_{AB}(x))] \end{aligned}$$

(2)

Every translation by the $G_{AB}$ generator will be checked by the $D_{B}$ discriminator, and the output of generator $G_{BA}$ will be assessed and controlled by the $D_{A}$ discriminator. Every time we translate from one domain to another, the discriminator will test if the output of the generator looks real or fake. Each generator will try to fool its adversary, the discriminator. While each generator tries to minimize the objective function, the corresponding discriminator tries to maximize it. The training objectives of this loss are $\min _{G_{AB}}\max _{D_B}\mathcal {L}_{GAN}(G_{AB},D_B,X,Y)$ and $\min _{G_{BA}}\max _{D_A}\mathcal {L}_{GAN}(G_{BA},D_A,X,Y)$.

Identity Loss: The identity loss measures if the output of the CycleGAN preserves the overall color temperature or structure of the picture. Pixel distance is used to ensure that ideally there is no difference between the output and the input, this ensures that the CycleGAN only changes the parts of the image when it needs to.

Model Training: The full objective of the CycleGAN is reducing these three loss functions. Actually, Zhu et al. show that training the networks with only one of the functions doesn’t arrive to high-quality results. In the formula, we can see that both the identity loss and cycle consistency functions are weighted by $\lambda _{ident}$ and $\lambda _{cyc}$, respectively. These scalars control the importance of each of the losses in the training. In our case, following the values for these parameters proposed in the original paper [21], $\lambda _{cyc}$ will be 10, and $\lambda _{ident}$ will be 0.1, as this last function only controls the tint of the background of the input and output images; and as our dataset is composed of the same colors, it does not suppose a large influence.

$$\begin{aligned} \mathcal {L}(G_{AB},G_{BA},D_A,D_B) =&\mathcal {L}_{GAN}(G_{AB},D_B,X,Y)+\mathcal {L}_{GAN}(G_{BA},D_A,X,Y) \nonumber \\&+\lambda _{cyc}\mathcal {L}_{cyc}(G_{AB},G_{BA}) +\lambda _{ident}\mathcal {L}_ {ident}(G_{AB},G_{BA}) \end{aligned}$$

(3)

1.3 A.3 CycleGAN Training details

The networks were trained from scratch with a starting learning rate of 0.0002 for 100 epochs, after this, it was trained for 100 epochs more with a learning rate of 0.00002, as suggested by Zhu et al. in [21]. Following this procedure, the objective loss function of the discriminator D was divided by 2, which slows down the rate at which D learns compared with the generator G.

For the generator and discriminator we adopt the same architectures as the ones proposed by Zhu et al. [21], with the difference that for the first and last layers in the generator, we used a padding of 3 due to the input image size of our dataset.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cabezon Pedroso, T., Ser, J.D., Díaz-Rodríguez, N. (2022). Capabilities, Limitations and Challenges of Style Transfer with CycleGANs: A Study on Automatic Ring Design Generation. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-14463-9_11
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Capabilities, Limitations and Challenges of Style Transfer with CycleGANs: A Study on Automatic Ring Design Generation