Skip to main content

Customizable Text-to-Image Modeling by Contrastive Learning on Adjustable Word-Visual Pairs

  • Conference paper
  • First Online:
Artificial Intelligence in HCI (HCII 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13336))

Included in the following conference series:

  • 2207 Accesses

Abstract

Co-creation with AI is trending and AI-generation of images from textual descriptions has shown advanced and attractive capabilities. However, commonly trained machine-learning models or built AI-based systems may have deficient points to generate satisfied results for personal usage or novice users of painting or AI co-creation, maybe because of deficient understanding of personal textual expressions or low customization of trained text-to-image machine learning models. Therefore, we assist in creating flexible and diverse visual contents from textual descriptions, by developing neural-networks models with machine learning. In modeling, we generate synthesized images using word-visual co-occurrence by Transformer model and synthesize images by decoding visual tokens. To improve visual and textual expressions and their relevance with more diversities, we utilize contrastive learning applying on texts, images, or pairs of texts and images. In the experimental results of a dataset of birds, we showed that the rendering quality was required of models with some scale neural-networks, and necessary training process with fined training by applying relatively low learning rates until the end of training. We further showed contrastive learning was possible for improvement of visual and textual expressions and their relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chang, Y., Subramanian, D., Pavuluri, R., Dinger, T.: Time series representation learning with contrastive triplet selection. In: Dasgupta, G., et al. (eds.) CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), Bangalore, India, 8–10 January 2022, pp. 46–53. ACM (2022)

    Google Scholar 

  2. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  3. Cherepkov, A., Voynov, A., Babenko, A.: Navigating the GAN parameter space for semantic image editing. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 3671–3680. Computer Vision Foundation/IEEE (2021)

    Google Scholar 

  4. Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 12873–12883. Computer Vision Foundation/IEEE (2021)

    Google Scholar 

  5. Haralabopoulos, G., Torres, M.T., Anagnostopoulos, I., McAuley, D.: Text data augmentations: permutation, antonyms and negation. Expert Syst. Appl. 177, 114769 (2021)

    Article  Google Scholar 

  6. Hong, Y., Niu, L., Zhang, J., Zhao, W., Fu, C., Zhang, L.: F2GAN: fusing-and-filling GAN for few-shot image generation. In: Chen, C.W., et al. (eds.) MM 2020: The 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, USA, 12–16 October 2020, pp. 2535–2543. ACM (2020)

    Google Scholar 

  7. Ji, G., Zhu, L., Zhuge, M., Fu, K.: Fast camouflaged object detection via edge-based reversible re-calibration network. Pattern Recognit. 123, 108414 (2022)

    Article  Google Scholar 

  8. Liu, D., Nabail, M., Hertzmann, A., Kalogerakis, E.: Neural contours: learning to draw lines from 3d shapes. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 5427–5435. Computer Vision Foundation/IEEE (2020)

    Google Scholar 

  9. Lo, Y., et al.: CLCC: contrastive learning for color constancy. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 8053–8063. Computer Vision Foundation/IEEE (2021)

    Google Scholar 

  10. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  11. Ramesh, A., et al.: Zero-shot text-to-image generation. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 8821–8831. PMLR (2021)

    Google Scholar 

  12. Richardson, E., et al.: Encoding in style: a styleGAN encoder for image-to-image translation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, 19–25 June 2021, pp. 2287–2296. Computer Vision Foundation/IEEE (2021)

    Google Scholar 

  13. Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, 7–12 August 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics (2016)

    Google Scholar 

  14. Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report. CNS-TR-2010-001, California Institute of Technology (2010)

    Google Scholar 

  15. Weng, L., Elsawah, A.M., Fang, K.: Cross-entropy loss for recommending efficient fold-over technique. J. Syst. Sci. Complex. 34(1), 402–439 (2021)

    Article  MathSciNet  Google Scholar 

  16. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. CoRR abs/2001.04193 (2020)

    Google Scholar 

  17. Zhang, R., et al.: A progressive generative adversarial method for structurally inadequate medical image data augmentation. IEEE J. Biomed. Health Inform. 26(1), 7–16 (2022)

    Google Scholar 

  18. Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 3774–3782. IEEE Computer Society (2017)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Japan Science and Technology Agency (JST CREST: JPMJCR19F2, Research Representative: Prof. Yoichi Ochiai, University of Tsukuba, Japan), and by University of Tsukuba (Basic Research Support Program Type A).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Li Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, JL., Ochiai, Y. (2022). Customizable Text-to-Image Modeling by Contrastive Learning on Adjustable Word-Visual Pairs. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2022. Lecture Notes in Computer Science(), vol 13336. Springer, Cham. https://doi.org/10.1007/978-3-031-05643-7_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05643-7_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05642-0

  • Online ISBN: 978-3-031-05643-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics