Abstract
Building on the success of the VU Amsterdam Metaphor Corpus, which comprises English texts annotated with metaphor following the Metaphor Identification Procedure Vrjie Universiteit (MIPVU; Steen et al. in Cogn Linguist 21(4):765–796, 2010a; Steen et al. in A method for linguistic metaphor identification: from MIP to MIPVU. John Benjamins, Amsterdam/Philadelphia, 2010b), this study has three aims: (1) to adapt and evaluate the transferability and reliability of MIPVU for Mandarin Chinese; (2) to construct a corpus of Chinese texts annotated for metaphor using the adapted procedure; and (3) to examine the distribution of metaphor-related words across Chinese texts in three different written registers: academic discourse, fiction, and news. The results of our inter-annotator reliability test show that MIPVU can be reliably applied to linguistic metaphor identification in Chinese texts. Our metaphor-annotated corpus consists of texts randomly sampled from the Lancaster Corpus of Mandarin Chinese, totaling 30,012 words (about 10,000 for each register). Data analysis reveals that approximately one out of every nine lexical units in our Chinese corpus is related to metaphor, that there is considerable variation in metaphor density across different registers and lexical categories, and that metaphor density is significantly lower in Chinese than in English texts. Our assessment of the replicability of MIPVU for Mandarin Chinese adds to the groundbreaking methodological contribution that Steen et al. (2010a, b) has made to metaphor research. The metaphor-annotated corpus of Mandarin Chinese contributes a valuable language resource for Chinese metaphor researchers, and our analysis of the distribution of metaphor-related words in the corpus offers useful new insights into the extent and use of metaphor in Chinese discourse.
Similar content being viewed by others
Notes
The VU Amsterdam Metaphor Corpus is available at: http://ota.ahds.ac.uk/headers/2541.xml.
In this paper, the terms ‘lexical unit’ and ‘word’ are used interchangeably in contexts where the differentiation is not important.
The files in the science category cover a broad range of academic disciplines, including humanities, social sciences, natural sciences, engineering, etc.
The corpora consulted, besides LCMC, included the Chinese National Corpus and the Sinica Corpus.
We also recognize that, by staying as close to the word segmentation results as provided by LCMC as possible with respect to the delimitation of lexical units, this study also inherited the constraints shared by Chinese tokenizers and their practical solutions to some thorny cases of discontinuous compounds, in particular, the VOCs split by aspect morphemes and the RVCs in the potential form (see the discussion in this section below). An alternative solution is to manually re-segment these split compounds, based on the more widely held view in the Chinese linguistics literature (e.g., Chao 1968; Li and Thompson 1981) that treats the aspects markers within VOCs as suffixes as well as the potential markers de 得 and bù 不 between the two parts of RVCs as infixes.
It is also possible, albeit extremely rare, for other types of Chinese compounds to be used discontinuously, or “ionized” (Chao 1968) (e.g. yōumò 幽默 ‘humor’ → yōu le tā yī mò 幽了他一默 ‘to make a joke with him’ and kāngkǎi 慷慨 ‘generous’ → kāng tā rén zhī kǎi 慷他人之慨 ‘generous with other people’s goods’). Such pseudo-VOCs differ from regular VOCs in two key aspects. First, the two parts of pseudo-VOCs in ionization are not independent words and thus cannot be examined separately for their contextual and basic meanings. Second, the two parts of pseudo-VOCs do not have a verb-object relation, so these forms would not have the dual status of a morphologically complex word and a syntactic phrase as true VOCs do (cf. Packard 2000). Given such internal properties of pseudo-VOCs, in cases of ionization, their two split parts are treated as a single lexical unit rather than separate ones.
In all analyses, tokens tagged as MFlags are not considered MRWs, but those tagged as WIDLLI are so that unwarranted exclusion of linguistic metaphors can be avoided.
Abbreviations
- 1pl :
-
First person plural pronoun
- adv :
-
Adverbial marker (de 地)
- assoc :
-
Associative (-de 的)
- cop :
-
Copular
- nom :
-
Nominalizer
- 3sg :
-
Third person singular pronoun
- asp :
-
Aspect marker
- cl :
-
Classifier
- gen :
-
Genitive (-de 的)
- neg :
-
Negator
References
Ahrens, K., Chung, S.-F., & Huang, C.-R. (2003). Conceptual metaphors: Ontology-based representation and corpora driven mapping principles. In Proceedings of the ACL 2003 workshop on the lexicon and figurative language (pp. 36–42). Stroudsburg, PA: Association for Computational Linguistics.
Ahrens, K., Chung, S.-F., & Huang, C.-R. (2004). From lexical semantics to conceptual metaphors: Mapping principle verification with WordNet and SUMO. In Proceedings of the fifth Chinese lexical semantics workshop (pp. 99–106). Singapore: COLIPS.
Badryzlova, Y., Shekhtman, N., Isaeva, Y., & Kerimov, R. (2013). Annotating a Russian corpus of conceptual metaphor: A bottom-up approach. In Proceedings of the first workshop on metaphor in NLP (pp. 77–86). Stroudsburg, PA: Association for Computational Linguistics.
Cameron, L. (2003). Metaphor in educational discourse. New York/London: Continuum.
Chao, Y. R. (1968). A grammar of spoken Chinese. Berkeley: University of California Press.
Charteris-Black, J. (2004). Corpus approaches to critical metaphor analysis. New York: Palgrave Macmillan.
Chen, K.-J., & Bai, M.-H. (1998). Unknown word detection for Chinese by a corpus-based learning method. Computational Linguistics and Chinese Language Processing, 3(1), 27–44.
Chen, K.-J., & Liu, S.-H. (1992). Word identification for Mandarin Chinese sentences. In Proceedings of the fourteenth conference on computational linguistics (pp. 101–107). Stroudsburg, PA: Association for Computational Linguistics.
Chiang, W.-Y., & Duann, R.-F. (2007). Conceptual metaphors for SARS: ‘War’ between whom? Discourse and Society, 18(5), 579–602.
Chiu, S.-H., & Chiang, W.-Y. (2011). FIGHT metaphors in legal discourse: What is unsaid in the story? Language and Linguistics, 12(4), 877–915.
Chung, S.-F., Ahrens, K., & Huang, C.-R. (2005). Source domains as concept domains in metaphorical expressions. Computational Linguistics and Chinese Language Processing, 10(4), 553–570.
Chung, S.-F., & Huang, C.-R. (2010). Using collocations to establish the source domains of conceptual metaphors. Journal of Chinese Linguistics, 38(2), 183–223.
Deignan, A. (2005). Metaphor and corpus linguistics. Amsterdam/Philadelphia: John Benjamins.
Deignan, A. (2015). MIP, the corpus, and dictionaries: What makes the best for metaphor analysis? Metaphor and the Social World, 5(1), 145–154.
Do Dinh, E.-L., & Gurevych, I. (2016). Token-level metaphor detection using neural networks. In Proceedings of the fourth workshop on metaphor in NLP (pp. 28–33). Stroudsburg, PA: Association for Computational Linguistics.
Dorst, A. G. (2011). Personification in discourse: Linguistic forms, conceptual structures and communicative functions. Language and Literature, 20(2), 113–135.
Duann, R.-F., & Huang, C.-R. (2015). When embodiment meets Generative Lexicon: The human body part metaphors in Sinica Corpus. In Proceedings of the twenty-ninth Pacific Asia conference on language, information and computation (pp. 396–403).
Dunn, J. (2013). What metaphor identification systems can tell us about metaphor-in-language. In Proceedings of the first workshop on metaphor in NLP (pp. 1–10). Stroudsburg, PA: Association for Computational Linguistics.
Gibbs, R. W. (Ed.). (2008). The Cambridge handbook of metaphor and thought. Cambridge/New York: Cambridge University Press.
Goatly, A. (1997). The language of metaphors. New York: Routledge.
Gong, S.-P., Ahrens, K., & Huang, C.-R. (2008). Chinese Word Sketch and mapping principles: A corpus-based study of conceptual metaphors using the BUILDING source domain. International Journal of Computer Processing of Languages, 21(1), 3–17.
Gray, B. (2010). On the use of demonstrative pronouns and determiners as cohesive devices: A focus on sentence-initial this/these in academic prose. Journal of English for Academic Purposes, 9(3), 167–183.
Haagsma, H., & Bjerva, J. (2016). Detecting novel metaphor using selectional preference information. In Proceedings of the fourth workshop on metaphor in NLP (pp. 10–17). Stroudsburg, PA: Association for Computational Linguistics.
Han, C. (2014). Metaphor and entertainment: A corpus-based approach to language in Chinese online news. New York: Palgrave Macmillan.
Hsieh, S. C.-Y. (2006). A corpus-based study on animal expressions in Mandarin Chinese and German. Journal of Pragmatics, 38, 2206–2222.
Huang, C.-R., & Ahrens, K. (2003). Individuals, kinds and events: Classifier coercion of nouns. Language Sciences, 25(4), 353–373.
Huang, C.-R., Chen, K.-J., Chen, F.-Y., & Chang, L.-L. (1997). Segmentation standard for Chinese natural language processing. Computational Linguistics and Chinese Language Processing, 2(2), 47–62.
Huang, C.-R., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., et al. (2010). 中文词汇网络: 跨语言知识处理基础架构的设计理念与实践 [Chinese WordNet: Design, implementation, and application of an infrastructure for cross-lingual knowledge processing]. 中文信息学报 [Journal of Chinese Information Processing], 24(3), 14–23.
Huang, C.-R., & Xue, N. (2015). Modeling word concepts without convention: linguistic and computational issues in Chinese word identification. In W. S.-Y. Wang & C.-F. Sun (Eds.), The Oxford handbook of Chinese linguistics (pp. 348–361). New York: Oxford University Press.
Jing-Schmidt, Z., & Peng, X. (2017). Winds and tigers: Metaphor choice in China’s anti-corruption discourse. Lingua Sinica. doi:10.1186/s40655-016-0017-9.
Kövecses, Z. (2005). Metaphor in culture: Universality and variation. Cambridge/New York: Cambridge University Press.
Kövecses, Z. (2010). Metaphor: A practical introduction (2nd ed.). New York: Oxford University Press.
Lakoff, G., & Johnson, M. (1980/2003). Metaphors we live by. Chicago: University of Chicago Press.
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western thought. New York: Basic Books.
Lakoff, G., & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor. Chicago/London: University of Chicago Press.
Lee, L.-H., Hsieh, S.-K., & Huang, C.-R. (2009). CWN-LMF: Chinese WordNet in the lexical markup framework. In Proceedings of the seventh workshop on Asian language resources (pp. 123–130). Stroudsburg, PA: Association for Computational Linguistics.
Li, C. N., & Thompson, S. A. (1981). Mandarin Chinese: A functional reference grammar. Berkeley: University of California Press.
Lu, X. (2007). A hybrid model for Chinese word segmentation. Journal for Language Technology and Computational Linguistics, 22(1), 71–88.
Lu, X. (2008). Improving part-of-speech guessing of Chinese unknown words using hybrid models. International Journal of Corpus Linguistics, 13(2), 169–193.
Lu, L. W.-L., & Ahrens, K. (2008). Ideological influence on BUILDING metaphors in Taiwanese presidential speeches. Discourse and Society, 19(3), 383–408.
McEnery, T., & Xiao, R. (2004). The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study. In M. Lino, M. Xavier, F. Ferreire, R. Costa & R. Silva (Eds.), Proceedings of the fourth international conference on language resources and evaluation (pp. 1175–1178). Lisbon: ELRA.
Nacey, S. (2013). Metaphors in learner English. Amsterdam/Philadelphia: John Benjamins.
Packard, J. L. (2000). The morphology of Chinese: A linguistic and cognitive approach. Cambridge/New York: Cambridge University Press.
Partington, A. (2006). Metaphors, motifs and similes across discourse types: Corpus-Assisted Discourse Studies (CADS) at work. In A. Stefanowitsch & S. T. Gries (Eds.), Corpus-based approaches to metaphor and metonymy (pp. 267–304). Berlin/New York: Mouton de Gruyter.
Pasma, T. (2012). Metaphor identification in Dutch discourse. In F. MacArthur, J. L. Oncins-Martínez, M. Sánchez-García, & A. M. Piquer-Píriz (Eds.), Metaphor in use: Context, culture, and communication (pp. 69–83). Amsterdam/Philadelphia: John Benjamins.
Pragglejaz Group. (2007). MIP: A method for identifying metaphorically used words in discourse. Metaphor and Symbol, 22(1), 1–39.
Ross, C. (1990). Resultative verb compounds. Journal of the Chinese Language Teachers Association, 25(3), 61–83.
Semino, E., Heywood, J., & Short, M. (2004). Methodological problems in the analysis of metaphors in a corpus of conversations about cancer. Journal of Pragmatics, 36(7), 1271–1294.
Shi, Y. (2002). The establishment of modern Chinese grammar: The formation of the resultative construction and its effects. Amsterdam/Philadelphia: John Benjamins.
Shutova, E., Devereux, B. J., & Korhonen, A. (2013). Conceptual metaphor theory meets the data: A corpus-based human annotation study. Language Resources and Evaluation, 47(4), 1261–1284.
Steen, G. J. (2007). Finding metaphor in grammar and usage. Amsterdam/Philadelphia: John Benjamins.
Steen, G. J., Dorst, A. G., Herrmann, J. B., Kaal, A. A., & Krennmayr, T. (2010a). Metaphor in usage. Cognitive Linguistics, 21(4), 765–796.
Steen, G. J., Dorst, A. G., Herrmann, J. B., Kaal, A. A., Krennmayr, T., & Pasma, T. (2010b). A method for linguistic metaphor identification: From MIP to MIPVU. Amsterdam/Philadelphia: John Benjamins.
Stefanowitsch, A. (2006). Corpus-based approaches to metaphor and metonymy. In A. Stefanowitsch & S. T. Gries (Eds.), Corpus-based approaches to metaphor and metonymy (pp. 1–16). Berlin/New York: Walter de Gruyter.
Stefanowitsch, A., & Gries, S. T. (Eds.). (2006). Corpus-based approaches to metaphor and metonymy. Berlin/New York: Mouton de Gruyter.
Sun, C. (2006). Chinese: A linguistic introduction. New York: Cambridge University Press.
Sun, M., & Zou, J. (2001). 汉语自动分词研究评述 [A critical appraisal of the research on Chinese word segmentation]. 当代语言学 [Contemporary Linguistics], 3(1), 22–32.
Tai, J. H.-Y. (1994). Chinese classifier systems and human categorization. In M. Chen & O. Tzeng (Eds.), In honor of William S.-Y. Wang: Interdisciplinary studies on language and language change (pp. 479–494). Taipei: Pyramid Press.
Tai, J. H.-Y., & Chao, F.-Y. (1994). A semantic study of the classifier zhang. Journal of the Chinese Language Teachers Association, 29(3), 67–78.
Tao, L. (1996). Topic discontinuity and zero anaphora in Chinese discourse. In B. A. Fox (Ed.), Studies in anaphora (pp. 487–513). Amsterdam/Philadelphia: John Benjamins.
Tay, D. (2015). Metaphor in case study articles on Chinese university counseling service websites. Chinese Language and Discourse, 6(1), 28–56.
Thompson, S. A. (1973). Resultative verb compounds in Mandarin Chinese: A case for lexical rules. Language, 49(2), 361–379.
Veale, T., Shutova, E., & Beigman Klebanov, B. (2016). Metaphor: A computational perspective. San Rafae, CA: Morgan & Claypool Publishers.
Xiao, R. (2017). Lancaster Corpus of Mandarin Chinese. In R. Sybesma (Ed.), Encyclopedia of Chinese language and linguistics. Amsterdam: Brill.
Yu, N. (1998). The contemporary theory of metaphor: A perspective from Chinese. Amsterdam/Philadelphia: John Benjamins.
Yu, N. (2008). Meatphor from body and culture. In R. W. Gibbs (Ed.), The Cambridge handbook of metaphor and thought (pp. 247–261). Cambridge/New York: Cambridge University Press.
Yu, N., & Jia, D. (2016). Metaphor in culture: LIFE IS A SHOW in Chinese. Cognitive Linguistics, 27(2), 147–180.
Zhang, H. (2007). Numeral classifiers in Mandarin Chinese. Journal of East Asian Linguistics, 16(1), 43–59.
Zhang, H.-P., & Liu, Q. (2002). 基于 N-最短路径方法的中文词语粗分模型 [Model of Chinese words rough segmentation based on N-shortest-paths method]. 中文信息学报 [Journal of Chinese Information Processing], 16(5), 1–7.
Chinese Dictionaries
Commercial Press Dictionary Research Centre. (2010). Xiandai Haiyu Xuexi Cidian [A Learner’s Dictionary of Modern Chinese]. Beijing: Commercial Press.
Li, X. (Ed.). (2014). Xiandai Hanyu Guifan Cidian (3rd ed.) [A Standard Dictionary of Modern Chinese]. Beijing: Foreign Language Teaching and Research Press.
Chinese Lexical Resources
Academia Sinica Balanced Corpus of Modern Chinese (Sinica Corpus). http://asbc.iis.sinica.edu.tw/.
Chinese WordNet. http://cwn.ling.sinica.edu.tw/.
Chinese National Corpus. http://www.cncorpus.org/index.aspx.
The Lancaster Corpus of Mandarin Chinese. http://www.lancaster.ac.uk/fass/projects/corpus/LCMC/.
Acknowledgements
We are grateful to the anonymous reviewers for their insightful comments and suggestions on earlier drafts of this manuscript. This project was funded by a Gil Watz Early Career Professorship in Language and Linguistics to the first author. Special thanks go to Chan-Chia Hsu and Eric Po-Chung Lin for their participation in the inter-annotator reliability test. We would also like to thank Haiyang Ai for his assistance at the early stage of this project. Any remaining inadequacies are our sole responsibility.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, X., Wang, B.PY. Towards a metaphor-annotated corpus of Mandarin Chinese. Lang Resources & Evaluation 51, 663–694 (2017). https://doi.org/10.1007/s10579-017-9392-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-017-9392-9