Skip to main content
Log in

Towards a metaphor-annotated corpus of Mandarin Chinese

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Building on the success of the VU Amsterdam Metaphor Corpus, which comprises English texts annotated with metaphor following the Metaphor Identification Procedure Vrjie Universiteit (MIPVU; Steen et al. in Cogn Linguist 21(4):765–796, 2010a; Steen et al. in A method for linguistic metaphor identification: from MIP to MIPVU. John Benjamins, Amsterdam/Philadelphia, 2010b), this study has three aims: (1) to adapt and evaluate the transferability and reliability of MIPVU for Mandarin Chinese; (2) to construct a corpus of Chinese texts annotated for metaphor using the adapted procedure; and (3) to examine the distribution of metaphor-related words across Chinese texts in three different written registers: academic discourse, fiction, and news. The results of our inter-annotator reliability test show that MIPVU can be reliably applied to linguistic metaphor identification in Chinese texts. Our metaphor-annotated corpus consists of texts randomly sampled from the Lancaster Corpus of Mandarin Chinese, totaling 30,012 words (about 10,000 for each register). Data analysis reveals that approximately one out of every nine lexical units in our Chinese corpus is related to metaphor, that there is considerable variation in metaphor density across different registers and lexical categories, and that metaphor density is significantly lower in Chinese than in English texts. Our assessment of the replicability of MIPVU for Mandarin Chinese adds to the groundbreaking methodological contribution that Steen et al. (2010a, b) has made to metaphor research. The metaphor-annotated corpus of Mandarin Chinese contributes a valuable language resource for Chinese metaphor researchers, and our analysis of the distribution of metaphor-related words in the corpus offers useful new insights into the extent and use of metaphor in Chinese discourse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The VU Amsterdam Metaphor Corpus is available at: http://ota.ahds.ac.uk/headers/2541.xml.

  2. In this paper, the terms ‘lexical unit’ and ‘word’ are used interchangeably in contexts where the differentiation is not important.

  3. The files in the science category cover a broad range of academic disciplines, including humanities, social sciences, natural sciences, engineering, etc.

  4. The corpora consulted, besides LCMC, included the Chinese National Corpus and the Sinica Corpus.

  5. Among previous studies that applied the earlier version of the annotation protocol (MIP) to Chinese, some (e.g. Duann and Huang 2015; Han 2014) also adapted the procedure to include metaphorical interpretations involved in the internal structures of Chinese characters and compound words.

  6. We also recognize that, by staying as close to the word segmentation results as provided by LCMC as possible with respect to the delimitation of lexical units, this study also inherited the constraints shared by Chinese tokenizers and their practical solutions to some thorny cases of discontinuous compounds, in particular, the VOCs split by aspect morphemes and the RVCs in the potential form (see the discussion in this section below). An alternative solution is to manually re-segment these split compounds, based on the more widely held view in the Chinese linguistics literature (e.g., Chao 1968; Li and Thompson 1981) that treats the aspects markers within VOCs as suffixes as well as the potential markers de 得 and 不 between the two parts of RVCs as infixes.

  7. It is also possible, albeit extremely rare, for other types of Chinese compounds to be used discontinuously, or “ionized” (Chao 1968) (e.g. yōumò 幽默 ‘humor’ → yōu le tā yī 幽了他一默 ‘to make a joke with him’ and kāngkǎi 慷慨 ‘generous’ → kāng tā rén zhī kǎi 慷他人之慨 ‘generous with other people’s goods’). Such pseudo-VOCs differ from regular VOCs in two key aspects. First, the two parts of pseudo-VOCs in ionization are not independent words and thus cannot be examined separately for their contextual and basic meanings. Second, the two parts of pseudo-VOCs do not have a verb-object relation, so these forms would not have the dual status of a morphologically complex word and a syntactic phrase as true VOCs do (cf. Packard 2000). Given such internal properties of pseudo-VOCs, in cases of ionization, their two split parts are treated as a single lexical unit rather than separate ones.

  8. In all analyses, tokens tagged as MFlags are not considered MRWs, but those tagged as WIDLLI are so that unwarranted exclusion of linguistic metaphors can be avoided.

  9. Work on automated metaphor detection using the VUAMC is already under way (e.g. Do Dinh and Gurevych 2016; Dunn 2013; Haagsma and Bjerva 2016).

Abbreviations

1pl :

First person plural pronoun

adv :

Adverbial marker (de 地)

assoc :

Associative (-de 的)

cop :

Copular

nom :

Nominalizer

3sg :

Third person singular pronoun

asp :

Aspect marker

cl :

Classifier

gen :

Genitive (-de 的)

neg :

Negator

References

  • Ahrens, K., Chung, S.-F., & Huang, C.-R. (2003). Conceptual metaphors: Ontology-based representation and corpora driven mapping principles. In Proceedings of the ACL 2003 workshop on the lexicon and figurative language (pp. 36–42). Stroudsburg, PA: Association for Computational Linguistics.

  • Ahrens, K., Chung, S.-F., & Huang, C.-R. (2004). From lexical semantics to conceptual metaphors: Mapping principle verification with WordNet and SUMO. In Proceedings of the fifth Chinese lexical semantics workshop (pp. 99–106). Singapore: COLIPS.

  • Badryzlova, Y., Shekhtman, N., Isaeva, Y., & Kerimov, R. (2013). Annotating a Russian corpus of conceptual metaphor: A bottom-up approach. In Proceedings of the first workshop on metaphor in NLP (pp. 77–86). Stroudsburg, PA: Association for Computational Linguistics.

  • Cameron, L. (2003). Metaphor in educational discourse. New York/London: Continuum.

    Google Scholar 

  • Chao, Y. R. (1968). A grammar of spoken Chinese. Berkeley: University of California Press.

    Google Scholar 

  • Charteris-Black, J. (2004). Corpus approaches to critical metaphor analysis. New York: Palgrave Macmillan.

    Book  Google Scholar 

  • Chen, K.-J., & Bai, M.-H. (1998). Unknown word detection for Chinese by a corpus-based learning method. Computational Linguistics and Chinese Language Processing, 3(1), 27–44.

    Google Scholar 

  • Chen, K.-J., & Liu, S.-H. (1992). Word identification for Mandarin Chinese sentences. In Proceedings of the fourteenth conference on computational linguistics (pp. 101–107). Stroudsburg, PA: Association for Computational Linguistics.

  • Chiang, W.-Y., & Duann, R.-F. (2007). Conceptual metaphors for SARS: ‘War’ between whom? Discourse and Society, 18(5), 579–602.

    Article  Google Scholar 

  • Chiu, S.-H., & Chiang, W.-Y. (2011). FIGHT metaphors in legal discourse: What is unsaid in the story? Language and Linguistics, 12(4), 877–915.

    Google Scholar 

  • Chung, S.-F., Ahrens, K., & Huang, C.-R. (2005). Source domains as concept domains in metaphorical expressions. Computational Linguistics and Chinese Language Processing, 10(4), 553–570.

    Google Scholar 

  • Chung, S.-F., & Huang, C.-R. (2010). Using collocations to establish the source domains of conceptual metaphors. Journal of Chinese Linguistics, 38(2), 183–223.

    Google Scholar 

  • Deignan, A. (2005). Metaphor and corpus linguistics. Amsterdam/Philadelphia: John Benjamins.

    Book  Google Scholar 

  • Deignan, A. (2015). MIP, the corpus, and dictionaries: What makes the best for metaphor analysis? Metaphor and the Social World, 5(1), 145–154.

    Article  Google Scholar 

  • Do Dinh, E.-L., & Gurevych, I. (2016). Token-level metaphor detection using neural networks. In Proceedings of the fourth workshop on metaphor in NLP (pp. 28–33). Stroudsburg, PA: Association for Computational Linguistics.

  • Dorst, A. G. (2011). Personification in discourse: Linguistic forms, conceptual structures and communicative functions. Language and Literature, 20(2), 113–135.

    Article  Google Scholar 

  • Duann, R.-F., & Huang, C.-R. (2015). When embodiment meets Generative Lexicon: The human body part metaphors in Sinica Corpus. In Proceedings of the twenty-ninth Pacific Asia conference on language, information and computation (pp. 396–403).

  • Dunn, J. (2013). What metaphor identification systems can tell us about metaphor-in-language. In Proceedings of the first workshop on metaphor in NLP (pp. 1–10). Stroudsburg, PA: Association for Computational Linguistics.

  • Gibbs, R. W. (Ed.). (2008). The Cambridge handbook of metaphor and thought. Cambridge/New York: Cambridge University Press.

    Google Scholar 

  • Goatly, A. (1997). The language of metaphors. New York: Routledge.

    Book  Google Scholar 

  • Gong, S.-P., Ahrens, K., & Huang, C.-R. (2008). Chinese Word Sketch and mapping principles: A corpus-based study of conceptual metaphors using the BUILDING source domain. International Journal of Computer Processing of Languages, 21(1), 3–17.

    Article  Google Scholar 

  • Gray, B. (2010). On the use of demonstrative pronouns and determiners as cohesive devices: A focus on sentence-initial this/these in academic prose. Journal of English for Academic Purposes, 9(3), 167–183.

    Article  Google Scholar 

  • Haagsma, H., & Bjerva, J. (2016). Detecting novel metaphor using selectional preference information. In Proceedings of the fourth workshop on metaphor in NLP (pp. 10–17). Stroudsburg, PA: Association for Computational Linguistics.

  • Han, C. (2014). Metaphor and entertainment: A corpus-based approach to language in Chinese online news. New York: Palgrave Macmillan.

    Book  Google Scholar 

  • Hsieh, S. C.-Y. (2006). A corpus-based study on animal expressions in Mandarin Chinese and German. Journal of Pragmatics, 38, 2206–2222.

    Article  Google Scholar 

  • Huang, C.-R., & Ahrens, K. (2003). Individuals, kinds and events: Classifier coercion of nouns. Language Sciences, 25(4), 353–373.

    Article  Google Scholar 

  • Huang, C.-R., Chen, K.-J., Chen, F.-Y., & Chang, L.-L. (1997). Segmentation standard for Chinese natural language processing. Computational Linguistics and Chinese Language Processing, 2(2), 47–62.

    Google Scholar 

  • Huang, C.-R., Hsieh, S.-K., Hong, J.-F., Chen, Y.-Z., Su, I.-L., Chen, Y.-X., et al. (2010). 中文词汇网络: 跨语言知识处理基础架构的设计理念与实践 [Chinese WordNet: Design, implementation, and application of an infrastructure for cross-lingual knowledge processing]. 中文信息学报 [Journal of Chinese Information Processing], 24(3), 14–23.

  • Huang, C.-R., & Xue, N. (2015). Modeling word concepts without convention: linguistic and computational issues in Chinese word identification. In W. S.-Y. Wang & C.-F. Sun (Eds.), The Oxford handbook of Chinese linguistics (pp. 348–361). New York: Oxford University Press.

    Google Scholar 

  • Jing-Schmidt, Z., & Peng, X. (2017). Winds and tigers: Metaphor choice in China’s anti-corruption discourse. Lingua Sinica. doi:10.1186/s40655-016-0017-9.

    Google Scholar 

  • Kövecses, Z. (2005). Metaphor in culture: Universality and variation. Cambridge/New York: Cambridge University Press.

    Book  Google Scholar 

  • Kövecses, Z. (2010). Metaphor: A practical introduction (2nd ed.). New York: Oxford University Press.

    Google Scholar 

  • Lakoff, G., & Johnson, M. (1980/2003). Metaphors we live by. Chicago: University of Chicago Press.

  • Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western thought. New York: Basic Books.

    Google Scholar 

  • Lakoff, G., & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor. Chicago/London: University of Chicago Press.

    Book  Google Scholar 

  • Lee, L.-H., Hsieh, S.-K., & Huang, C.-R. (2009). CWN-LMF: Chinese WordNet in the lexical markup framework. In Proceedings of the seventh workshop on Asian language resources (pp. 123–130). Stroudsburg, PA: Association for Computational Linguistics.

  • Li, C. N., & Thompson, S. A. (1981). Mandarin Chinese: A functional reference grammar. Berkeley: University of California Press.

    Google Scholar 

  • Lu, X. (2007). A hybrid model for Chinese word segmentation. Journal for Language Technology and Computational Linguistics, 22(1), 71–88.

    Google Scholar 

  • Lu, X. (2008). Improving part-of-speech guessing of Chinese unknown words using hybrid models. International Journal of Corpus Linguistics, 13(2), 169–193.

    Article  Google Scholar 

  • Lu, L. W.-L., & Ahrens, K. (2008). Ideological influence on BUILDING metaphors in Taiwanese presidential speeches. Discourse and Society, 19(3), 383–408.

    Article  Google Scholar 

  • McEnery, T., & Xiao, R. (2004). The Lancaster Corpus of Mandarin Chinese: A corpus for monolingual and contrastive language study. In M. Lino, M. Xavier, F. Ferreire, R. Costa & R. Silva (Eds.), Proceedings of the fourth international conference on language resources and evaluation (pp. 1175–1178). Lisbon: ELRA.

  • Nacey, S. (2013). Metaphors in learner English. Amsterdam/Philadelphia: John Benjamins.

    Book  Google Scholar 

  • Packard, J. L. (2000). The morphology of Chinese: A linguistic and cognitive approach. Cambridge/New York: Cambridge University Press.

    Book  Google Scholar 

  • Partington, A. (2006). Metaphors, motifs and similes across discourse types: Corpus-Assisted Discourse Studies (CADS) at work. In A. Stefanowitsch & S. T. Gries (Eds.), Corpus-based approaches to metaphor and metonymy (pp. 267–304). Berlin/New York: Mouton de Gruyter.

    Google Scholar 

  • Pasma, T. (2012). Metaphor identification in Dutch discourse. In F. MacArthur, J. L. Oncins-Martínez, M. Sánchez-García, & A. M. Piquer-Píriz (Eds.), Metaphor in use: Context, culture, and communication (pp. 69–83). Amsterdam/Philadelphia: John Benjamins.

    Chapter  Google Scholar 

  • Pragglejaz Group. (2007). MIP: A method for identifying metaphorically used words in discourse. Metaphor and Symbol, 22(1), 1–39.

    Article  Google Scholar 

  • Ross, C. (1990). Resultative verb compounds. Journal of the Chinese Language Teachers Association, 25(3), 61–83.

    Google Scholar 

  • Semino, E., Heywood, J., & Short, M. (2004). Methodological problems in the analysis of metaphors in a corpus of conversations about cancer. Journal of Pragmatics, 36(7), 1271–1294.

    Article  Google Scholar 

  • Shi, Y. (2002). The establishment of modern Chinese grammar: The formation of the resultative construction and its effects. Amsterdam/Philadelphia: John Benjamins.

    Book  Google Scholar 

  • Shutova, E., Devereux, B. J., & Korhonen, A. (2013). Conceptual metaphor theory meets the data: A corpus-based human annotation study. Language Resources and Evaluation, 47(4), 1261–1284.

    Article  Google Scholar 

  • Steen, G. J. (2007). Finding metaphor in grammar and usage. Amsterdam/Philadelphia: John Benjamins.

    Book  Google Scholar 

  • Steen, G. J., Dorst, A. G., Herrmann, J. B., Kaal, A. A., & Krennmayr, T. (2010a). Metaphor in usage. Cognitive Linguistics, 21(4), 765–796.

    Article  Google Scholar 

  • Steen, G. J., Dorst, A. G., Herrmann, J. B., Kaal, A. A., Krennmayr, T., & Pasma, T. (2010b). A method for linguistic metaphor identification: From MIP to MIPVU. Amsterdam/Philadelphia: John Benjamins.

    Book  Google Scholar 

  • Stefanowitsch, A. (2006). Corpus-based approaches to metaphor and metonymy. In A. Stefanowitsch & S. T. Gries (Eds.), Corpus-based approaches to metaphor and metonymy (pp. 1–16). Berlin/New York: Walter de Gruyter.

    Chapter  Google Scholar 

  • Stefanowitsch, A., & Gries, S. T. (Eds.). (2006). Corpus-based approaches to metaphor and metonymy. Berlin/New York: Mouton de Gruyter.

    Google Scholar 

  • Sun, C. (2006). Chinese: A linguistic introduction. New York: Cambridge University Press.

    Book  Google Scholar 

  • Sun, M., & Zou, J. (2001). 汉语自动分词研究评述 [A critical appraisal of the research on Chinese word segmentation]. 当代语言学 [Contemporary Linguistics], 3(1), 22–32.

  • Tai, J. H.-Y. (1994). Chinese classifier systems and human categorization. In M. Chen & O. Tzeng (Eds.), In honor of William S.-Y. Wang: Interdisciplinary studies on language and language change (pp. 479–494). Taipei: Pyramid Press.

    Google Scholar 

  • Tai, J. H.-Y., & Chao, F.-Y. (1994). A semantic study of the classifier zhang. Journal of the Chinese Language Teachers Association, 29(3), 67–78.

    Google Scholar 

  • Tao, L. (1996). Topic discontinuity and zero anaphora in Chinese discourse. In B. A. Fox (Ed.), Studies in anaphora (pp. 487–513). Amsterdam/Philadelphia: John Benjamins.

    Chapter  Google Scholar 

  • Tay, D. (2015). Metaphor in case study articles on Chinese university counseling service websites. Chinese Language and Discourse, 6(1), 28–56.

    Article  Google Scholar 

  • Thompson, S. A. (1973). Resultative verb compounds in Mandarin Chinese: A case for lexical rules. Language, 49(2), 361–379.

    Article  Google Scholar 

  • Veale, T., Shutova, E., & Beigman Klebanov, B. (2016). Metaphor: A computational perspective. San Rafae, CA: Morgan & Claypool Publishers.

    Google Scholar 

  • Xiao, R. (2017). Lancaster Corpus of Mandarin Chinese. In R. Sybesma (Ed.), Encyclopedia of Chinese language and linguistics. Amsterdam: Brill.

    Google Scholar 

  • Yu, N. (1998). The contemporary theory of metaphor: A perspective from Chinese. Amsterdam/Philadelphia: John Benjamins.

    Book  Google Scholar 

  • Yu, N. (2008). Meatphor from body and culture. In R. W. Gibbs (Ed.), The Cambridge handbook of metaphor and thought (pp. 247–261). Cambridge/New York: Cambridge University Press.

    Chapter  Google Scholar 

  • Yu, N., & Jia, D. (2016). Metaphor in culture: LIFE IS A SHOW in Chinese. Cognitive Linguistics, 27(2), 147–180.

    Article  Google Scholar 

  • Zhang, H. (2007). Numeral classifiers in Mandarin Chinese. Journal of East Asian Linguistics, 16(1), 43–59.

    Article  Google Scholar 

  • Zhang, H.-P., & Liu, Q. (2002). 基于 N-最短路径方法的中文词语粗分模型 [Model of Chinese words rough segmentation based on N-shortest-paths method]. 中文信息学报 [Journal of Chinese Information Processing], 16(5), 1–7.

Chinese Dictionaries

  • Commercial Press Dictionary Research Centre. (2010). Xiandai Haiyu Xuexi Cidian [A Learner’s Dictionary of Modern Chinese]. Beijing: Commercial Press.

  • Li, X. (Ed.). (2014). Xiandai Hanyu Guifan Cidian (3rd ed.) [A Standard Dictionary of Modern Chinese]. Beijing: Foreign Language Teaching and Research Press.

Chinese Lexical Resources

Download references

Acknowledgements

We are grateful to the anonymous reviewers for their insightful comments and suggestions on earlier drafts of this manuscript. This project was funded by a Gil Watz Early Career Professorship in Language and Linguistics to the first author. Special thanks go to Chan-Chia Hsu and Eric Po-Chung Lin for their participation in the inter-annotator reliability test. We would also like to thank Haiyang Ai for his assistance at the early stage of this project. Any remaining inadequacies are our sole responsibility.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofei Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, X., Wang, B.PY. Towards a metaphor-annotated corpus of Mandarin Chinese. Lang Resources & Evaluation 51, 663–694 (2017). https://doi.org/10.1007/s10579-017-9392-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9392-9

Keywords

Navigation