Skip to main content
Log in

iPoet: interactive painting poetry creation with visual multimodal analysis

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

Chinese painting poetry is an extraordinary aesthetic phenomenon in world art history. It is not only part of the paintings but also helps us to better understand the spiritual conception that the artists express. In this paper, we present an interactive visual system to enable ordinary users to compose customized painting poetry for ancient Chinese paintings, which contain three properties: (1) We employ object detection and image captioning to describe the scenery depicted in the painting. (2) We extend the modern color theory to analyze the underlying emotions of each painting. (3) We propose an interactive poetry generation method that takes the content description and the emotional expression to add the diversity of the poetry creation. Several visual components are carefully designed to visualize and contextualize the features in the painting. They effectively guide users to steer the creation of personalized painting poems. We conduct efficient case studies and user interviews to demonstrate the effectiveness of our system.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.ltfc.net/.

  2. https://www.gushiwen.org/.

References

  • Anderson P, Fernando B, Johnson M, Gould S (2016) Spice: Semantic propositional image caption evaluation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer, pp 382–398

  • Chen H, Yi X, Sun M, Li W, Yang C, Guo Z (2019) Sentiment-controllable chinese poetry generation. pp 4925–4931

  • Cheng W-F, Wu C-C, Song R, Fu J, Xie X, Nie J-Y (2018) Image inspired poetry generation in xiaoice. arXiv preprintarXiv:1808.03090

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprintarXiv:1406.1078

  • Giovannangeli L, Bourqui R, Giot R, Auber D (2020) Toward automatic comparison of visualization techniques: application to graph visualization. Vis Inform 4(2):86–98

    Article  Google Scholar 

  • Han D, Pan J, Zhao X, Chen W (2021) Netv. js: a web-based library for high-efficiency visualization of large-scale graphs and networks. Vis Inform 5(1):61–66

    Article  Google Scholar 

  • Hu H (2018) Visualization design and research of the style and sects change of song ci. Harbin Institute Of Technology (Master’s thesis)

  • Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Toward controlled generation of text. In: Proceedings of ICML, pp 1587–1596

  • Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of CVPR, pp 3296–3297

  • Johnson J, Krishna R, Stark M, Li L-J, Shamma DA, Bernstein MS, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of CVPR, pp 3668–3678

  • Kaneko A, Komatsu A, Itoh T, Wang FY (2020) Painting image browser applying an associate-rule-aware multidimensional data visualization technique. Vis Comput Ind Biomed Art 3(1):1–13

    Article  Google Scholar 

  • Kang D, Shim H, Yoon K (2018) A method for extracting emotion using colors comprise the painting image. Multimed Tools Appl 77(4):4985–5002

    Article  Google Scholar 

  • Karpathy A, Li F (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of CVPR, pp 3128–3137

  • Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg AC, Berg TL (2013) Babytalk: understanding and generating simple image descriptions. TPAMI 35(12):2891–2903

    Article  Google Scholar 

  • Leite RA, Arleo A, Sorger J, Gschwandtner T, Miksch S (2020) Hermes: guidance-enriched visual analytics for economic network exploration. Vis Inform 4(4):11–22

    Article  Google Scholar 

  • Li Y, Fujiwara T, Choi YK, Kim KK, Ma K-L (2020) A visual analytics system for multi-model comparison on clinical data predictions. Vis Inform 4(2):122–131

    Article  Google Scholar 

  • Liu L, Wan X, Guo Z (2018) Images2poem: Generating Chinese poetry from image streams. In: Proceedings of ACMMM, pp 1967–1975

  • Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors, vol 9905, pp 852–869

  • Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: Proceedings of CVPR, pp 3242–3250

  • McCurdy N, Lein J, Coles K, Meyer M (2015) Poemage: visualizing the sonic topology of a poem. TVCG 22(1):439–448

    Google Scholar 

  • Meneses L, Furuta R (2015) Visualizing poetry: Tools for critical analysis. paj: J Init Digit Hum Med Cult 3:1

    Article  Google Scholar 

  • Newell A, Deng J (2017) Pixels to graphs by associative embedding, vol NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp 2168–2177

  • Pinaud B, Vallet J, Melançon G (2020) On visualization techniques comparison for large social networks overview: a user experiment. Vis Inform 4(4):23–34

    Article  Google Scholar 

  • Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. TPAMI 39(6):1137–1149

    Article  Google Scholar 

  • Schuster S, Krishna R, Chang A, Fei-Fei L, Manning C (2015) Generating semantically precise scene graphs from textual descriptions for improved image retrieval. pp 70–80

  • Shi L, Liao Q, Tong H, Hu Y, Wang C, Lin C, Qian W (2020) Oniongraph: Hierarchical topology+ attribute multivariate network visualization. Vis Inform 4(1):43–57

    Article  Google Scholar 

  • Shu X, Wu J, Wu X, Liang H, Cui W, Wu Y, Qu H (2021) Dancingwords: exploring animated word clouds to tell stories. J Vis 24(1):85–100

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint:1409.1556

  • Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp 2818–2826

  • Takahashi F, Kawabata Y (2018) The association between colors and emotions for emotional words and facial expressions. Color Res Appl 43(2):247–257

    Article  Google Scholar 

  • Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of CVPR, pp 3156–3164

  • Wang X, Zeng H, Wang Y, Wu A, Sun Z, Ma X, Qu H (2020) Voicecoach: Interactive evidence-based training for voice modulation skills in public speaking. In: Proceedings of CHI, pp 1–12. ACM

  • Wang Y, Haleem H, Shi C, Wu Y, Zhao X, Fu S, Qu H (2018) Towards easy comparison of local businesses using online reviews. Comput Gr Forum 37(3):63–74

    Article  Google Scholar 

  • Wang Z, He W, Wu H, Wu H, Li W, Wang H, Chen E (2016) Chinese poetry generation with planning based neural network. arXiv preprint arXiv:1610.09889

  • Wu L, Xu M, Qian S, Cui J (2020) Image to modern chinese poetry creation via a constrained topic-aware model. TOMM 16(2):1–21

    Article  Google Scholar 

  • Xu D, Zhu Y, Choy C, Fei-Fei L (2017) Scene graph generation by iterative message passing. pp 3097–3106

  • Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of ICML, pp 2048–2057

  • Xu L, Jiang L, Qin C, Wang Z, Du D (2018) How images inspire poems: Generating classical chinese poetry from images with memory networks. In: Proceedings of AAAI, vol 32

  • Yan R (2016) i, poet: Automatic poetry composition through recurrent neural networks with iterative polishing schema. pp 2238–2244

  • Yang J, Fan J, Hubball D, Gao Y, Luo H, Ribarsky W, Ward M (2006) Semantic image browser: bridging information visualization with automated intelligent image analysis, pp 191–198

  • Yang X, Tang K, Zhang H, Cai J (2019) Auto-encoding scene graphs for image captioning. In: Proceedings of CVPR, pp 10677–10686

  • Yi X, Li R, Yang C, Li W, Sun M (2020) Mixpoet: diverse poetry generation via learning controllable mixed latent space. Proc AAAI 34:9450–9457

    Article  Google Scholar 

  • Yi X, Sun M, Li R, Yang Z (2018) Chinese poetry generation with a working memory model. arXiv preprint arXiv:1809.04306

  • Zhang W, Siwei T, Liu K, Lei S, Chen S, Chen W (2019) A new perspective on the study of literature (songci): text correlation and spatio-temporal visual analytics. J Comput-Aided Des Comput Gr 31(10):1687–1697

    Google Scholar 

  • Zhang X, Lapata M (2014) Chinese poetry generation with recurrent neural networks. In: Proceedings of EMNLP, pp 670–680

  • Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Gr 27(2):1698–1708

    Article  Google Scholar 

  • Zhao Y, Luo X, Lin X, Wang H, Kui X, Zhou F, Wang J, Chen Y, Chen W (2019) Visual analytics for electromagnetic situation awareness in radio monitoring and management. IEEE Trans Vis Comput Gr 26(1):590–600

    Article  Google Scholar 

  • Zhou F, Lin X, Liu C, Zhao Y, Xu P, Ren L, Xue T, Ren L (2019) A survey of visualization for smart manufacturing. J Vis 22(2):419–435

    Article  Google Scholar 

  • Zhou H, Huang M, Zhang T, Zhu X, Liu B (2018) Emotional chatting machine: emotional conversation generation with internal and external memory. In: Proceedings of AAAI, vol 32

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (61972122, 61772456).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiazhou Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, Y., Chen, J., Huang, K. et al. iPoet: interactive painting poetry creation with visual multimodal analysis. J Vis 25, 671–685 (2022). https://doi.org/10.1007/s12650-021-00780-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-021-00780-0

Keywords

Navigation