A Corpus of Quotation Element Annotation for Chinese Novels: Construction, Extraction and Application

Xie, Jinge; Yan, Yuchen; Liu, Chen; Jia, Yuxiang; Zan, Hongying

doi:10.1007/978-981-99-8178-6_5

Jinge Xie¹⁰,
Yuchen Yan¹⁰,
Chen Liu¹⁰,
Yuxiang Jia¹⁰ &
…
Hongying Zan¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1967))

Included in the following conference series:

International Conference on Neural Information Processing

407 Accesses

Abstract

Quotations or dialogues are important for literary works, like novels. In the famous Jin Yong’s novels, about a half of all sentences contain quotations. Quotation elements like speaker, speech mode, speech cue and the quotation itself are very useful to the analysis of fictional characters. To build models for automatic quotation element extraction, we construct the first quotation corpus with annotation of all the four quotation elements, and the corpus size of 31,922 quotations is one of the largest to our knowledge. Based on the corpus, we compare different models for quotation element extraction and conduct extensive experiments. For the application of extracted quotation elements, we explore character recognition and gender classification, and find out that quotation and speech mode are effective for the two tasks. We will extend our work from Jin Yong’s novels to other novels to analyze various characters from different angles based on quotation structures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Article Google Scholar
Chen, J.X., Ling, Z.H., Dai, L.R.: A Chinese dataset for identifying speakers in novels. In: INTERSPEECH, Graz, Austria, pp. 1561–1565 (2019)
Google Scholar
Chen, Y., Ling, Z.H., Liu, Q.F.: A neural-network-based approach to identifying speakers in novels. In: Interspeech, pp. 4114–4118 (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Elson, D., McKeown, K.: Automatic attribution of quoted speech in literary narrative. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 24, pp. 1013–1019 (2010)
Google Scholar
Glass, K., Bangay, S.: A naive salience-based method for speaker identification in fiction books. In: Proceedings of the 18th Annual Symposium of the Pattern Recognition Association of South Africa (PRASA 2007), pp. 1–6. Citeseer (2007)
Google Scholar
He, H., Barbosa, D., Kondrak, G.: Identification of speakers in novels. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1312–1320 (2013)
Google Scholar
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)
Article Google Scholar
Jia, Y., Dou, H., Cao, S., Zan, H.: Speaker identification and its application to social network construction for Chinese novels. Inter. J. Asian Lang. Process. 30(04), 2050018 (2020)
Article Google Scholar
Lee, J.S., Yeung, C.Y.: An annotated corpus of direct speech. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 1059–1063 (2016)
Google Scholar
Muzny, G., Fang, M., Chang, A., Jurafsky, D.: A two-stage sieve approach for quote attribution. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers., pp. 460–470 (2017)
Google Scholar
O’ Keefe, T., Pareti, S., Curran, J.R., Koprinska, I., Honnibal, M.: A sequence labelling approach to quote attribution. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 790–799 (2012)
Google Scholar
Page, N.: Speech in the English novel. Springer (1988). https://doi.org/10.1007/978-1-349-19047-8
Papay, S., Padó, S.: Riqua: a corpus of rich quotation annotation for english literary text. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 835–841 (2020)
Google Scholar
Pareti, S.: Parc 3.0: a corpus of attribution relations. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 3914–3920 (2016)
Google Scholar
Schmerling, E.: Whose line is it?-quote attribution through recurrent neural networks
Google Scholar
Sun, Y., et al.: Ernie: enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019)
Vishnubhotla, K., Hammond, A., Hirst, G.: Are fictional voices distinguishable? classifying character voices in modern drama. In: Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 29–34 (2019)
Google Scholar
Vishnubhotla, K., Hammond, A., Hirst, G.: The project dialogism novel corpus: a dataset for quotation attribution in literary texts. arXiv preprint arXiv:2204.05836 (2022)
Yu, D., Zhou, B., Yu, D.: End-to-end chinese speaker identification. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2274–2285 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, China
Jinge Xie, Yuchen Yan, Chen Liu, Yuxiang Jia & Hongying Zan

Authors

Jinge Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Yan
View author publications
You can also search for this author in PubMed Google Scholar
Chen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiang Jia
View author publications
You can also search for this author in PubMed Google Scholar
Hongying Zan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuxiang Jia .

Editor information

Editors and Affiliations

Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangdong, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, J., Yan, Y., Liu, C., Jia, Y., Zan, H. (2024). A Corpus of Quotation Element Annotation for Chinese Novels: Construction, Extraction and Application. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1967. Springer, Singapore. https://doi.org/10.1007/978-981-99-8178-6_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-8178-6_5
Published: 30 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8177-9
Online ISBN: 978-981-99-8178-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Corpus of Quotation Element Annotation for Chinese Novels: Construction, Extraction and Application