iPoet: interactive painting poetry creation with visual multimodal analysis

Feng, Yingchaojie; Chen, Jiazhou; Huang, Keyu; Wong, Jason K.; Ye, Hui; Zhang, Wei; Zhu, Rongchen; Luo, Xiaonan; Chen, Wei

doi:10.1007/s12650-021-00780-0

iPoet: interactive painting poetry creation with visual multimodal analysis

Regular Paper
Published: 19 November 2021

Volume 25, pages 671–685, (2022)
Cite this article

Journal of Visualization Aims and scope Submit manuscript

Yingchaojie Feng ORCID: orcid.org/0000-0002-1418-4635¹,
Jiazhou Chen²,
Keyu Huang²,
Jason K. Wong³,
Hui Ye¹,
Wei Zhang¹,
Rongchen Zhu¹,
Xiaonan Luo⁴ &
…
Wei Chen¹

987 Accesses
Explore all metrics

Abstract

Chinese painting poetry is an extraordinary aesthetic phenomenon in world art history. It is not only part of the paintings but also helps us to better understand the spiritual conception that the artists express. In this paper, we present an interactive visual system to enable ordinary users to compose customized painting poetry for ancient Chinese paintings, which contain three properties: (1) We employ object detection and image captioning to describe the scenery depicted in the painting. (2) We extend the modern color theory to analyze the underlying emotions of each painting. (3) We propose an interactive poetry generation method that takes the content description and the emotional expression to add the diversity of the poetry creation. Several visual components are carefully designed to visualize and contextualize the features in the painting. They effectively guide users to steer the creation of personalized painting poems. We conduct efficient case studies and user interviews to demonstrate the effectiveness of our system.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Visual Saliency for the Visualization of Digital Paintings

Interactive Rakuchu Rakugai-zu (Views in and Around Kyoto)

A Study of Digital Media Art Utilizing 2D Animation: Digital Video Expression Using Projection Mapping and Multi Screen Techniques

Notes

References

Anderson P, Fernando B, Johnson M, Gould S (2016) Spice: Semantic propositional image caption evaluation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer, pp 382–398
Chen H, Yi X, Sun M, Li W, Yang C, Guo Z (2019) Sentiment-controllable chinese poetry generation. pp 4925–4931
Cheng W-F, Wu C-C, Song R, Fu J, Xie X, Nie J-Y (2018) Image inspired poetry generation in xiaoice. arXiv preprintarXiv:1808.03090
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprintarXiv:1406.1078
Giovannangeli L, Bourqui R, Giot R, Auber D (2020) Toward automatic comparison of visualization techniques: application to graph visualization. Vis Inform 4(2):86–98
Article Google Scholar
Han D, Pan J, Zhao X, Chen W (2021) Netv. js: a web-based library for high-efficiency visualization of large-scale graphs and networks. Vis Inform 5(1):61–66
Article Google Scholar
Hu H (2018) Visualization design and research of the style and sects change of song ci. Harbin Institute Of Technology (Master’s thesis)
Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Toward controlled generation of text. In: Proceedings of ICML, pp 1587–1596
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of CVPR, pp 3296–3297
Johnson J, Krishna R, Stark M, Li L-J, Shamma DA, Bernstein MS, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of CVPR, pp 3668–3678
Kaneko A, Komatsu A, Itoh T, Wang FY (2020) Painting image browser applying an associate-rule-aware multidimensional data visualization technique. Vis Comput Ind Biomed Art 3(1):1–13
Article Google Scholar
Kang D, Shim H, Yoon K (2018) A method for extracting emotion using colors comprise the painting image. Multimed Tools Appl 77(4):4985–5002
Article Google Scholar
Karpathy A, Li F (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of CVPR, pp 3128–3137
Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg AC, Berg TL (2013) Babytalk: understanding and generating simple image descriptions. TPAMI 35(12):2891–2903
Article Google Scholar
Leite RA, Arleo A, Sorger J, Gschwandtner T, Miksch S (2020) Hermes: guidance-enriched visual analytics for economic network exploration. Vis Inform 4(4):11–22
Article Google Scholar
Li Y, Fujiwara T, Choi YK, Kim KK, Ma K-L (2020) A visual analytics system for multi-model comparison on clinical data predictions. Vis Inform 4(2):122–131
Article Google Scholar
Liu L, Wan X, Guo Z (2018) Images2poem: Generating Chinese poetry from image streams. In: Proceedings of ACMMM, pp 1967–1975
Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors, vol 9905, pp 852–869
Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: Proceedings of CVPR, pp 3242–3250
McCurdy N, Lein J, Coles K, Meyer M (2015) Poemage: visualizing the sonic topology of a poem. TVCG 22(1):439–448
Google Scholar
Meneses L, Furuta R (2015) Visualizing poetry: Tools for critical analysis. paj: J Init Digit Hum Med Cult 3:1
Article Google Scholar
Newell A, Deng J (2017) Pixels to graphs by associative embedding, vol NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp 2168–2177
Pinaud B, Vallet J, Melançon G (2020) On visualization techniques comparison for large social networks overview: a user experiment. Vis Inform 4(4):23–34
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. TPAMI 39(6):1137–1149
Article Google Scholar
Schuster S, Krishna R, Chang A, Fei-Fei L, Manning C (2015) Generating semantically precise scene graphs from textual descriptions for improved image retrieval. pp 70–80
Shi L, Liao Q, Tong H, Hu Y, Wang C, Lin C, Qian W (2020) Oniongraph: Hierarchical topology+ attribute multivariate network visualization. Vis Inform 4(1):43–57
Article Google Scholar
Shu X, Wu J, Wu X, Liang H, Cui W, Wu Y, Qu H (2021) Dancingwords: exploring animated word clouds to tell stories. J Vis 24(1):85–100
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint:1409.1556
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp 2818–2826
Takahashi F, Kawabata Y (2018) The association between colors and emotions for emotional words and facial expressions. Color Res Appl 43(2):247–257
Article Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of CVPR, pp 3156–3164
Wang X, Zeng H, Wang Y, Wu A, Sun Z, Ma X, Qu H (2020) Voicecoach: Interactive evidence-based training for voice modulation skills in public speaking. In: Proceedings of CHI, pp 1–12. ACM
Wang Y, Haleem H, Shi C, Wu Y, Zhao X, Fu S, Qu H (2018) Towards easy comparison of local businesses using online reviews. Comput Gr Forum 37(3):63–74
Article Google Scholar
Wang Z, He W, Wu H, Wu H, Li W, Wang H, Chen E (2016) Chinese poetry generation with planning based neural network. arXiv preprint arXiv:1610.09889
Wu L, Xu M, Qian S, Cui J (2020) Image to modern chinese poetry creation via a constrained topic-aware model. TOMM 16(2):1–21
Article Google Scholar
Xu D, Zhu Y, Choy C, Fei-Fei L (2017) Scene graph generation by iterative message passing. pp 3097–3106
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of ICML, pp 2048–2057
Xu L, Jiang L, Qin C, Wang Z, Du D (2018) How images inspire poems: Generating classical chinese poetry from images with memory networks. In: Proceedings of AAAI, vol 32
Yan R (2016) i, poet: Automatic poetry composition through recurrent neural networks with iterative polishing schema. pp 2238–2244
Yang J, Fan J, Hubball D, Gao Y, Luo H, Ribarsky W, Ward M (2006) Semantic image browser: bridging information visualization with automated intelligent image analysis, pp 191–198
Yang X, Tang K, Zhang H, Cai J (2019) Auto-encoding scene graphs for image captioning. In: Proceedings of CVPR, pp 10677–10686
Yi X, Li R, Yang C, Li W, Sun M (2020) Mixpoet: diverse poetry generation via learning controllable mixed latent space. Proc AAAI 34:9450–9457
Article Google Scholar
Yi X, Sun M, Li R, Yang Z (2018) Chinese poetry generation with a working memory model. arXiv preprint arXiv:1809.04306
Zhang W, Siwei T, Liu K, Lei S, Chen S, Chen W (2019) A new perspective on the study of literature (songci): text correlation and spatio-temporal visual analytics. J Comput-Aided Des Comput Gr 31(10):1687–1697
Google Scholar
Zhang X, Lapata M (2014) Chinese poetry generation with recurrent neural networks. In: Proceedings of EMNLP, pp 670–680
Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Gr 27(2):1698–1708
Article Google Scholar
Zhao Y, Luo X, Lin X, Wang H, Kui X, Zhou F, Wang J, Chen Y, Chen W (2019) Visual analytics for electromagnetic situation awareness in radio monitoring and management. IEEE Trans Vis Comput Gr 26(1):590–600
Article Google Scholar
Zhou F, Lin X, Liu C, Zhao Y, Xu P, Ren L, Xue T, Ren L (2019) A survey of visualization for smart manufacturing. J Vis 22(2):419–435
Article Google Scholar
Zhou H, Huang M, Zhang T, Zhu X, Liu B (2018) Emotional chatting machine: emotional conversation generation with internal and external memory. In: Proceedings of AAAI, vol 32

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (61972122, 61772456).

Author information

Authors and Affiliations

State Key Lab of CAD&CG, Zhejiang University, Zhejiang, China
Yingchaojie Feng, Hui Ye, Wei Zhang, Rongchen Zhu & Wei Chen
College of Computer Science and Technology, Zhejiang University of Technology, Zhejiang, China
Jiazhou Chen & Keyu Huang
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Jason K. Wong
Guilin University of Electronic Technology, Guilin, China
Xiaonan Luo

Authors

Yingchaojie Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jiazhou Chen
View author publications
You can also search for this author in PubMed Google Scholar
Keyu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jason K. Wong
View author publications
You can also search for this author in PubMed Google Scholar
Hui Ye
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Rongchen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaonan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiazhou Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feng, Y., Chen, J., Huang, K. et al. iPoet: interactive painting poetry creation with visual multimodal analysis. J Vis 25, 671–685 (2022). https://doi.org/10.1007/s12650-021-00780-0

Download citation

Received: 08 July 2021
Accepted: 18 July 2021
Published: 19 November 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s12650-021-00780-0

Keywords