Abstract
This study applies AI technology to build academic Chinese corpora. Python was employed to extract lexical chunks of various lengths, including 3-gram, 4-gram, 5-gram, and 6-gram. The identification of these lexical chunks was performed using the New-MI algorithm and filtered based on semantic relevance completeness. Subsequently, manual intervention was applied to eliminate duplicate entries and identify 1431 continuous word chunks. These lexical chunks were classified into three categories according to their functions: research-oriented, text-oriented, and participation-oriented. It was found that there were some differences in the use of chunks between Korean Chinese learners and native Chinese writers, with research-oriented chunks being used more frequently in both groups than in other categories. Korean Chinese learners used research-oriented, text-oriented, and participant-oriented chunks less frequently than native speakers. This study might provide a reference for academic Chinese writing and academic Chinese textbook development for Chinese language learners.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Miller, G.A., Selfridge, J.A.: Verbal context and the recall of meaningful material. Am. J. Psychol. 63(2), 176–185 (1950)
Becker, J. D.: The phrasal lexicon. In: Nash-Webber, B., Schank, R. (eds.) Theoretical Issues in Natural Language Processing. Beranek and Newman, Cambridge: Bolt (1975)
Erman, B., Warren, B.: The idiom principle and the open choice principle. Text & Talk 20(1), 29–62 (2000)
Oppenheim N.: The importance of recurrent sequences for non-native speaker fluency and cognition. Heidi Riggenbach (2000)
Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23(4), 397–423 (2004)
Biber, D., Johansson, S., Leech, G., et al.: Longman grammar of spoken and written English. Longman, London (1999)
Wray, A.: Formulaic sequences in second language teaching: principle and practice. Appl. Linguis.Linguis. 21(4), 463–489 (2000)
Biber, D., Conrad, S., Cortes, V.: If you look at…: lexical bundles in university teaching and textbooks. Appl. Linguis. 25(3), 371–405 (2004)
Hyland, K.: As can be seen: Lexical bundles and disciplinary variation. English Specific Purposes (New York), 27(1), 4–21 (2008)
Salazar, D.J.L.: Lexical bundles in scientific English: A corpus-based study of native and non-native writing, Doctoral dissertation, Universitat de Barcelona (2011)
Li, S., Liu, Q., Bai, S.: Chinese chunking parsing using rule-based and statistics-based methods. J. Comput. Res. Developm. 4, 385–391 (2002). (in Chinese)
Liang, Y., Zhao, T., Yu, H., et al.: Chinese text chunking based on improved K-means clustering. J. Harbin Inst. Technol.. 7, 1106–1109 (2007). (in Chinese)
Li, G., Liu, Z., Wang, R., et al.: Chinese base-chunk identification using hidden-layer feature of segmentation. J. Chin. Inform. Process. 2, 12–17 (2016). (in Chinese)
Culpeper J., Kytö M.: Lexical Bundles in Early Modern English Dialogues: A Window into the Speech-related Language of the Past (2002)
Cortes, V.: Lexical bundles in freshman composition. In: Reppen, R., Fitzmaurice, S.M., Biber, D. (eds.) Using corpora to explore linguistic variation, pp. 131–145. John Benjamins Publishing Company, Amsterdam (2002)
Nesi, H., Basturkmen, H.: Lexical bundles and discourse signaling in academic lectures. Inter. J. Corpus Linguis. 11(3), 283–304 (2006)
Biber, D., Barbieri, F.: Lexical bundles in university spoken and written registers. English Specific Purposes (New York) 26(3), 263–286 (2007)
Cortes, V., Csomay, E.: Positioning lexical bundles in university lectures. In: Campoy, M.C., Luzón, M.J. (eds.) Spoken corpora in applied linguistics (Linguistic Insights 51), pp. 57–76. Peter Lang, Frankfurt am Main (2007)
Kopaczyk, J.: Long lexical bundles and standardisation in historical legal texts. Studia Anglica Posnaniensia 47(2–3), 3–25 (2012)
Leńko-Szymańska, A.: The acquisition of formulaic language by EFL learners. Inter. J. Corpus Linguis. 19(2), 225–251 (2014)
Liu, Z., Chen, H., Yang, H.: A study on common phraseological sequences in Chinese humanities and social science papers. J. Chin. Lang. Teach. 14(1), 119–152 (2017). (in Chinese)
Zhou, Q.: The Construction of a Collocation List Based on Academic Papers of Teaching Chinese to Speakers of Other Languages. In: Liu, M., Kit, C., Qi., Su (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 576–592. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_49
Silva, J.F., Lopes G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: 6th Meeting on the Mathematics of Language, Orlando, FL (1999)
Wei, N., Li, J.: A new computing method for extracting contiguous phraseological sequences from academic text corpora. Inter. J. Corpus Linguis. 18(4), 506–535 (2013)
Simpson-Vlach, R., Ellis, N.C.: An academic formulas list: new methods in phraseology research. Appl. Linguis. 31(4), 487–512 (2010)
Snow, C.E., Uccelli, P.: The challenge of academic language. In: Olson, D.R., Torrance, N. (eds.) The Cambridge Handbook of Literacy, pp. 112–133. Cambridge University Press, New York (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, Q., Mou, L. (2024). A Corpus-Based Study of Lexical Chunks in Chinese Academic Discourse: Extraction, Classification, and Application. In: Dong, M., Hong, JF., Lin, J., Jin, P. (eds) Chinese Lexical Semantics. CLSW 2023. Lecture Notes in Computer Science(), vol 14515. Springer, Singapore. https://doi.org/10.1007/978-981-97-0586-3_21
Download citation
DOI: https://doi.org/10.1007/978-981-97-0586-3_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0585-6
Online ISBN: 978-981-97-0586-3
eBook Packages: Computer ScienceComputer Science (R0)