Skip to main content

A Corpus-Based Study of Lexical Chunks in Chinese Academic Discourse: Extraction, Classification, and Application

  • Conference paper
  • First Online:
Chinese Lexical Semantics (CLSW 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14515))

Included in the following conference series:

  • 87 Accesses

Abstract

This study applies AI technology to build academic Chinese corpora. Python was employed to extract lexical chunks of various lengths, including 3-gram, 4-gram, 5-gram, and 6-gram. The identification of these lexical chunks was performed using the New-MI algorithm and filtered based on semantic relevance completeness. Subsequently, manual intervention was applied to eliminate duplicate entries and identify 1431 continuous word chunks. These lexical chunks were classified into three categories according to their functions: research-oriented, text-oriented, and participation-oriented. It was found that there were some differences in the use of chunks between Korean Chinese learners and native Chinese writers, with research-oriented chunks being used more frequently in both groups than in other categories. Korean Chinese learners used research-oriented, text-oriented, and participant-oriented chunks less frequently than native speakers. This study might provide a reference for academic Chinese writing and academic Chinese textbook development for Chinese language learners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Miller, G.A., Selfridge, J.A.: Verbal context and the recall of meaningful material. Am. J. Psychol. 63(2), 176–185 (1950)

    Article  Google Scholar 

  2. Becker, J. D.: The phrasal lexicon. In: Nash-Webber, B., Schank, R. (eds.) Theoretical Issues in Natural Language Processing. Beranek and Newman, Cambridge: Bolt (1975)

    Google Scholar 

  3. Erman, B., Warren, B.: The idiom principle and the open choice principle. Text & Talk 20(1), 29–62 (2000)

    Google Scholar 

  4. Oppenheim N.: The importance of recurrent sequences for non-native speaker fluency and cognition. Heidi Riggenbach (2000)

    Google Scholar 

  5. Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23(4), 397–423 (2004)

    Article  Google Scholar 

  6. Biber, D., Johansson, S., Leech, G., et al.: Longman grammar of spoken and written English. Longman, London (1999)

    Google Scholar 

  7. Wray, A.: Formulaic sequences in second language teaching: principle and practice. Appl. Linguis.Linguis. 21(4), 463–489 (2000)

    Article  Google Scholar 

  8. Biber, D., Conrad, S., Cortes, V.: If you look at…: lexical bundles in university teaching and textbooks. Appl. Linguis. 25(3), 371–405 (2004)

    Article  Google Scholar 

  9. Hyland, K.: As can be seen: Lexical bundles and disciplinary variation. English  Specific Purposes (New York), 27(1), 4–21 (2008)

    Google Scholar 

  10. Salazar, D.J.L.: Lexical bundles in scientific English: A corpus-based study of native and non-native writing, Doctoral dissertation, Universitat de Barcelona (2011)

    Google Scholar 

  11. Li, S., Liu, Q., Bai, S.: Chinese chunking parsing using rule-based and statistics-based methods. J. Comput. Res. Developm. 4, 385–391 (2002). (in Chinese)

    Google Scholar 

  12. Liang, Y., Zhao, T., Yu, H., et al.: Chinese text chunking based on improved K-means clustering. J. Harbin Inst. Technol.. 7, 1106–1109 (2007). (in Chinese)

    Google Scholar 

  13. Li, G., Liu, Z., Wang, R., et al.: Chinese base-chunk identification using hidden-layer feature of segmentation. J. Chin. Inform. Process. 2, 12–17 (2016). (in Chinese)

    Google Scholar 

  14. Culpeper J., Kytö M.: Lexical Bundles in Early Modern English Dialogues: A Window into the Speech-related Language of the Past (2002)

    Google Scholar 

  15. Cortes, V.: Lexical bundles in freshman composition. In: Reppen, R., Fitzmaurice, S.M., Biber, D. (eds.) Using corpora to explore linguistic variation, pp. 131–145. John Benjamins Publishing Company, Amsterdam (2002)

    Chapter  Google Scholar 

  16. Nesi, H., Basturkmen, H.: Lexical bundles and discourse signaling in academic lectures. Inter. J. Corpus Linguis. 11(3), 283–304 (2006)

    Article  Google Scholar 

  17. Biber, D., Barbieri, F.: Lexical bundles in university spoken and written registers. English Specific Purposes (New York) 26(3), 263–286 (2007)

    Google Scholar 

  18. Cortes, V., Csomay, E.: Positioning lexical bundles in university lectures. In: Campoy, M.C., Luzón, M.J. (eds.) Spoken corpora in applied linguistics (Linguistic Insights 51), pp. 57–76. Peter Lang, Frankfurt am Main (2007)

    Google Scholar 

  19. Kopaczyk, J.: Long lexical bundles and standardisation in historical legal texts. Studia Anglica Posnaniensia 47(2–3), 3–25 (2012)

    Article  Google Scholar 

  20. Leńko-Szymańska, A.: The acquisition of formulaic language by EFL learners. Inter. J. Corpus Linguis. 19(2), 225–251 (2014)

    Article  Google Scholar 

  21. Liu, Z., Chen, H., Yang, H.: A study on common phraseological sequences in Chinese humanities and social science papers. J. Chin. Lang. Teach. 14(1), 119–152 (2017). (in Chinese)

    Google Scholar 

  22. Zhou, Q.: The Construction of a Collocation List Based on Academic Papers of Teaching Chinese to Speakers of Other Languages. In: Liu, M., Kit, C., Qi., Su (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 576–592. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_49

  23. Silva, J.F., Lopes G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: 6th Meeting on the Mathematics of Language, Orlando, FL (1999)

    Google Scholar 

  24. Wei, N., Li, J.: A new computing method for extracting contiguous phraseological sequences from academic text corpora. Inter. J. Corpus Linguis. 18(4), 506–535 (2013)

    Article  Google Scholar 

  25. Simpson-Vlach, R., Ellis, N.C.: An academic formulas list: new methods in phraseology research. Appl. Linguis. 31(4), 487–512 (2010)

    Article  Google Scholar 

  26. Snow, C.E., Uccelli, P.: The challenge of academic language. In: Olson, D.R., Torrance, N. (eds.) The Cambridge Handbook of Literacy, pp. 112–133. Cambridge University Press, New York (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qihong Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Q., Mou, L. (2024). A Corpus-Based Study of Lexical Chunks in Chinese Academic Discourse: Extraction, Classification, and Application. In: Dong, M., Hong, JF., Lin, J., Jin, P. (eds) Chinese Lexical Semantics. CLSW 2023. Lecture Notes in Computer Science(), vol 14515. Springer, Singapore. https://doi.org/10.1007/978-981-97-0586-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0586-3_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0585-6

  • Online ISBN: 978-981-97-0586-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics