Abstract
Creative natural language generation, such as poetry generation, writing lyrics, and storytelling, is appealing but difficult to evaluate. We take the application of image-inspired poetry generation as a showcase and investigate two problems in evaluation: (1) how to evaluate the generated text when there are no ground truths, and (2) how to evaluate nondeterministic systems that output different texts given the same input image. Regarding the first problem, we first design a judgment tool to collect ratings of a few poems for comparison with the inspiring image shown to assessors. We then propose a novelty measurement that quantifies how different a generated text is compared to a known corpus. Regarding the second problem, we experiment with different strategies to approximate evaluating multiple trials of output poems. We also use a measure for quantifying the diversity of different texts generated in response to the same input image, and discuss their merits.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Boden, M.A.: The Creative Mind: Myths and Mechanisms. Basic Books Inc, New York (1991)
Cheng, W.F., Wu, C.C., Song, R., Fu, J., Xie, X., Nie, J.Y.: Image inspired poetry generation in xiaoice. CoRR abs/1808.03090 (2018)
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990). http://dl.acm.org/citation.cfm?id=89086.89095
Colton, S., Goodwin, J., Veale, T.: Full-face poetry generation. In: Proceedings of the Third International Conference on Computational Creativity (ICCC 2012) (2012)
Deng, F., Siersdorfer, S., Zerr, S.: Efficient jaccard-based diversity analysis of large document collections. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management CIKM 2012, pp. 1402–1411. ACM, New York (2012). https://doi.org/10.1145/2396761.2398445, http://doi.acm.org/10.1145/2396761.2398445
Devlin, J., et al.: Language models for image captioning: the quirks and what works. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (ACL 2017) (2015)
Galley, M., et al.: deltaBLEU: a discriminative metric for generation tasks with intrinsically diverse targets. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 445–450. Association for Computational Linguistics, Beijing, China, July 2015. http://www.aclweb.org/anthology/P15-2073
Ghazvininejad, M., Shi, X., Priyadarshi, J., Knight, K.: Hafez: an interactive poetry generation system. In: Proceedings of ACL 2017, System Demonstrations, pp. 43–48. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-4008, http://aclanthology.coli.uni-saarland.de/pdf/P/P17/P17-4008.pdf
Goncalo Oliveira, H., Hervas, R., Diaz, A., Gervas, P.: Multilingual extension and evaluation of a poetry generator. Nat. Lang. Eng. 23(6), 929–967 (2017). https://doi.org/10.1017/S1351324917000171
Hastie, H., Belz, A.: A comparative evaluation methodology for NLG in interactive systems. In: Calzolari, N. (ed.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (2014)
He, J., Jiang, L., Ming, Z.: Generating Chinese couplets using a statistical MT approach. In: Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING 2008, pp. 377–384. Association for Computational Linguistics, Stroudsburg (2008). http://dl.acm.org/citation.cfm?id=1599081.1599129
Hopkins, J., Kiela, D.: Automatically generating rhythmic verse with neural networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 168–178. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1016, http://aclanthology.coli.uni-saarland.de/pdf/P/P17/P17-1016.pdf
Jones, K.S., Galliers, J.R.: Evaluating Natural Language Processing Systems: An Analysis and Review. Springer, New York (1996). https://doi.org/10.1007/BFb0027470
Jordanous, A.: A standardised procedure for evaluating creative systems: computational creativity evaluation based on what it is to be creative. Cogn. Comput. 4(3), 246–279 (2012). https://doi.org/10.1007/s12559-012-9156-1
Lamb, C., Brown, D., Clarke, C.: Evaluating digital poetry: insights from the CAT. In: Proceedings of the Seventh International Conference on Computational Creativity (ICCC 2016). Sony CSL, Paris, France (2016). http://www.computationalcreativity.net/iccc2016/wp-content/uploads/2016/01/Evaluating-digital-poetry.pdf
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, July 2004. https://www.microsoft.com/en-us/research/publication/rouge-a-package-for-automatic-evaluation-of-summaries/
Liu, B., Fu, J., Kato, M.P., Yoshikawa, M.: Beyond narrative description: generating poetry from images by multi-adversarial training. In: Proceedings of the 26th ACM International Conference on Multimedia MM 2018, pp. 783–791. ACM, New York (2018). https://doi.org/10.1145/3240508.3240587, https://doi.acm.org/10.1145/3240508.3240587
Mellish, C., Dale, R.: Evaluation in the context of natural language generation. Comput. Speech Lang. 12(4), 349–373 (1998). https://doi.org/10.1006/csla.1998.0106, http://www.sciencedirect.com/science/article/pii/S0885230898901061
Oliveira, H.G.: Poetryme: a versatile platform for poetry generation. Comput. Creativity, Concept Invention Gen. Intell. 1, 21 (2012)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 311–318. Association for Computational Linguistics, Stroudsburg (2002). https://doi.org/10.3115/1073083.1073135
Potash, P., Romanov, A., Rumshisky, A.: Evaluating Creative Language Generation: The Case of Rap Lyric Ghostwriting. ArXiv e-prints, December 2016
Ritchie, G.: Assessing creativity. In: Proceedings of the AISB01 Symposium on Artificial Intelligence and Creativity in Arts and Science, pp. 3–11 (2001)
Ritchie, G.: Some empirical criteria for attributing creativity to a computer program. Minds Mach. 17(1), 67–99 (2007). https://doi.org/10.1007/s11023-007-9066-2, http://dx.doi.org/10.1007/s11023-007-9066-2
Stent, A., Marge, M., Singhai, M.: Evaluating evaluation methods for generation in the presence of variation. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 341–351. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30586-6_38
van der Velde, F., Wolf, R., Schmettow, M., Nazareth, D.: A Semantic Map for Evaluating Creativity, pp. 94–101. WordPress, June 2015
Wang, Q., Luo, T., Wang, D.: Can machine generate traditional chinese poetry? a feigenbaum test. In: Liu, C.-L., Hussain, A., Luo, B., Tan, K.C., Zeng, Y., Zhang, Z. (eds.) BICS 2016. LNCS (LNAI), vol. 10023, pp. 34–46. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49685-6_4
Yan, R.: i, poet: Automatic poetry composition through recurrent neural networks with iterative polishing schema. In: Kambhampati, S. (ed.) Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, 9–15 July 2016, pp. 2238–2244. IJCAI/AAAI Press, New York (2016). http://www.ijcai.org/Abstract/16/319
Zhang, M., Hurley, N.: Avoiding monotony: improving the diversity of recommendation lists. In: Proceedings of the 2008 ACM Conference on Recommender Systems RecSys 2008, pp. 123–130. ACM, New York (2008). https://doi.org/10.1145/1454008.1454030, http://doi.acm.org/10.1145/1454008.1454030
Zhang, X., Lapata, M.: Chinese Poetry Generation with Recurrent Neural Networks, pp. 670–680. Association for Computational Linguistics, October 2014
Zhu, X., Xu, Z., Khot, T.: How creative is your writing? a linguistic creativity measure from computer science and cognitive psychology perspectives. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity CALC 2009, pp. 87–93. Association for Computational Linguistics, Stroudsburg (2009). http://dl.acm.org/citation.cfm?id=1642011.1642023
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, CC., Song, R., Sakai, T., Cheng, WF., Xie, X., Lin, SD. (2019). Evaluating Image-Inspired Poetry Generation. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11838. Springer, Cham. https://doi.org/10.1007/978-3-030-32233-5_42
Download citation
DOI: https://doi.org/10.1007/978-3-030-32233-5_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32232-8
Online ISBN: 978-3-030-32233-5
eBook Packages: Computer ScienceComputer Science (R0)