A Corpus-Based Study of Lexical Chunks in Chinese Academic Discourse: Extraction, Classification, and Application

Zhou, Qihong; Mou, Li

doi:10.1007/978-981-97-0586-3_21

Qihong Zhou¹¹ &
Li Mou^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14515))

Included in the following conference series:

Workshop on Chinese Lexical Semantics

87 Accesses

Abstract

This study applies AI technology to build academic Chinese corpora. Python was employed to extract lexical chunks of various lengths, including 3-gram, 4-gram, 5-gram, and 6-gram. The identification of these lexical chunks was performed using the New-MI algorithm and filtered based on semantic relevance completeness. Subsequently, manual intervention was applied to eliminate duplicate entries and identify 1431 continuous word chunks. These lexical chunks were classified into three categories according to their functions: research-oriented, text-oriented, and participation-oriented. It was found that there were some differences in the use of chunks between Korean Chinese learners and native Chinese writers, with research-oriented chunks being used more frequently in both groups than in other categories. Korean Chinese learners used research-oriented, text-oriented, and participant-oriented chunks less frequently than native speakers. This study might provide a reference for academic Chinese writing and academic Chinese textbook development for Chinese language learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Miller, G.A., Selfridge, J.A.: Verbal context and the recall of meaningful material. Am. J. Psychol. 63(2), 176–185 (1950)
Article Google Scholar
Becker, J. D.: The phrasal lexicon. In: Nash-Webber, B., Schank, R. (eds.) Theoretical Issues in Natural Language Processing. Beranek and Newman, Cambridge: Bolt (1975)
Google Scholar
Erman, B., Warren, B.: The idiom principle and the open choice principle. Text & Talk 20(1), 29–62 (2000)
Google Scholar
Oppenheim N.: The importance of recurrent sequences for non-native speaker fluency and cognition. Heidi Riggenbach (2000)
Google Scholar
Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23(4), 397–423 (2004)
Article Google Scholar
Biber, D., Johansson, S., Leech, G., et al.: Longman grammar of spoken and written English. Longman, London (1999)
Google Scholar
Wray, A.: Formulaic sequences in second language teaching: principle and practice. Appl. Linguis.Linguis. 21(4), 463–489 (2000)
Article Google Scholar
Biber, D., Conrad, S., Cortes, V.: If you look at…: lexical bundles in university teaching and textbooks. Appl. Linguis. 25(3), 371–405 (2004)
Article Google Scholar
Hyland, K.: As can be seen: Lexical bundles and disciplinary variation. English Specific Purposes (New York), 27(1), 4–21 (2008)
Google Scholar
Salazar, D.J.L.: Lexical bundles in scientific English: A corpus-based study of native and non-native writing, Doctoral dissertation, Universitat de Barcelona (2011)
Google Scholar
Li, S., Liu, Q., Bai, S.: Chinese chunking parsing using rule-based and statistics-based methods. J. Comput. Res. Developm. 4, 385–391 (2002). (in Chinese)
Google Scholar
Liang, Y., Zhao, T., Yu, H., et al.: Chinese text chunking based on improved K-means clustering. J. Harbin Inst. Technol.. 7, 1106–1109 (2007). (in Chinese)
Google Scholar
Li, G., Liu, Z., Wang, R., et al.: Chinese base-chunk identification using hidden-layer feature of segmentation. J. Chin. Inform. Process. 2, 12–17 (2016). (in Chinese)
Google Scholar
Culpeper J., Kytö M.: Lexical Bundles in Early Modern English Dialogues: A Window into the Speech-related Language of the Past (2002)
Google Scholar
Cortes, V.: Lexical bundles in freshman composition. In: Reppen, R., Fitzmaurice, S.M., Biber, D. (eds.) Using corpora to explore linguistic variation, pp. 131–145. John Benjamins Publishing Company, Amsterdam (2002)
Chapter Google Scholar
Nesi, H., Basturkmen, H.: Lexical bundles and discourse signaling in academic lectures. Inter. J. Corpus Linguis. 11(3), 283–304 (2006)
Article Google Scholar
Biber, D., Barbieri, F.: Lexical bundles in university spoken and written registers. English Specific Purposes (New York) 26(3), 263–286 (2007)
Google Scholar
Cortes, V., Csomay, E.: Positioning lexical bundles in university lectures. In: Campoy, M.C., Luzón, M.J. (eds.) Spoken corpora in applied linguistics (Linguistic Insights 51), pp. 57–76. Peter Lang, Frankfurt am Main (2007)
Google Scholar
Kopaczyk, J.: Long lexical bundles and standardisation in historical legal texts. Studia Anglica Posnaniensia 47(2–3), 3–25 (2012)
Article Google Scholar
Leńko-Szymańska, A.: The acquisition of formulaic language by EFL learners. Inter. J. Corpus Linguis. 19(2), 225–251 (2014)
Article Google Scholar
Liu, Z., Chen, H., Yang, H.: A study on common phraseological sequences in Chinese humanities and social science papers. J. Chin. Lang. Teach. 14(1), 119–152 (2017). (in Chinese)
Google Scholar
Zhou, Q.: The Construction of a Collocation List Based on Academic Papers of Teaching Chinese to Speakers of Other Languages. In: Liu, M., Kit, C., Qi., Su (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 576–592. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_49
Silva, J.F., Lopes G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: 6th Meeting on the Mathematics of Language, Orlando, FL (1999)
Google Scholar
Wei, N., Li, J.: A new computing method for extracting contiguous phraseological sequences from academic text corpora. Inter. J. Corpus Linguis. 18(4), 506–535 (2013)
Article Google Scholar
Simpson-Vlach, R., Ellis, N.C.: An academic formulas list: new methods in phraseology research. Appl. Linguis. 31(4), 487–512 (2010)
Article Google Scholar
Snow, C.E., Uccelli, P.: The challenge of academic language. In: Olson, D.R., Torrance, N. (eds.) The Cambridge Handbook of Literacy, pp. 112–133. Cambridge University Press, New York (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Chinese Language and Culture, Sichuan International Studies University, Chongqing, 400031, China
Qihong Zhou
School of Education, Faculty of Social Sciences and Humanities, Universiti Teknologi Malaysia, 81310, Johor, UTM Johor Bahru, Malaysia
Li Mou
School of Gerenal Education, Chongqing City Management College, Chongqing, 401331, China
Li Mou

Authors

Qihong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Li Mou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qihong Zhou .

Editor information

Editors and Affiliations

Institute for Infocomm Research, Singapore, Singapore
Minghui Dong
National Taiwan Normal University, Taipei, Taiwan
Jia-Fei Hong
Nanyang Technological University, Singapore, Singapore
Jingxia Lin
Leshan Normal University, Leshan, China
Peng Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Q., Mou, L. (2024). A Corpus-Based Study of Lexical Chunks in Chinese Academic Discourse: Extraction, Classification, and Application. In: Dong, M., Hong, JF., Lin, J., Jin, P. (eds) Chinese Lexical Semantics. CLSW 2023. Lecture Notes in Computer Science(), vol 14515. Springer, Singapore. https://doi.org/10.1007/978-981-97-0586-3_21

Download citation

DOI: https://doi.org/10.1007/978-981-97-0586-3_21
Published: 28 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0585-6
Online ISBN: 978-981-97-0586-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics