Abstract
The present paper describes the process of identifying lexical bundles, i.e., frequently recurring word sequences such as by means of and in the end of, in secondary school history and physics textbooks. In its determination of finding genuine lexical bundles, i.e. the word boundaries between lexical bundles and surrounding arbitrary words, it proposes a new approach to come to terms with the problem of extracting overlapping bundles of different lengths. The results of the structural classification indicate that history uses more NP/PP-based and less dependent-clause-based bundles than physics. The comparative analysis manages to restrict this difference to the referential function. History almost only refers to phrases, i.e. within clauses, while physics much more tends to make references across clauses. The article also includes a report on an extension of the study, ongoing work where the automatic identification of multi-word expressions in general is in focus.
The research reported here was supported in part by the Swedish Research Council (through the project Swedish FrameNet++, VR dnr 2010–6013), and by the University of Gothenburg (through its support of the Centre for Language Technology and Språkbanken/the Swedish Language Bank).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
This is a bit like attempting to discover the words in un-word segmented text by looking at the frequency of, e.g., four-character sequences, which seems to be an exercise of doubtful value.
- 2.
- 3.
- 4.
A normalisation gives 30 and 50 times/MW for physics and history respectively.
- 5.
References
Cummins, J.: The entry and fallacy in bilingual education. NABE J. 4(3), 25–29 (1980)
Macken-Horarik, M.: Literacy and learning across the curriculum: towards a model of register for secondary school teachers. In: Hasan, R., Williams, G. (eds.) Literacy in Society, pp. 232–279. Longman, London (1996)
Halliday, M.A.K., Martin, J.: Writing Science: Literacy and Discursive Power. Falmer Press, London (1993)
Gardner, D.: Validating the construct of word in applied corpus-based vocabulary research: a critical survey. Appl. Linguist. 28(2), 241–265 (2007)
Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23, 397–423 (2004)
Strunkyte, G., Jurkunaite, E.: Written Academic Discourse: Lexical Bundles in Humanities and Natural Sciences. Vilnius University, Vilnius (2008)
Jackendoff, R.: The Architecture of the Language Faculty. MIT Press, Cambridge (1997)
Atkins, S.B.T., Rundell, M.: Guide to Practical Lexicography. Oxford University Press, London (2008)
Chen, Y.H., Baker, P.: Lexical bundles in L1 and L2 writing. Lang. Learn. Technol. 14(2), 30–49 (2010)
Nekrasova, T.M.: English L1 and L2 speakers’ knowledge of lexical bundles. Lang. Learn. 59(3), 647–486 (2009)
Biber, D., Conrad, S.: Lexical bundles in conversation and academic prose. In: Hasselgard, H., Oksefjell, S. (eds.) Out of Corpora: Studies in Honor of Stig Johansson, pp. 181–189. Rodopi, Amsterdam (1999)
Biber, D., Barbieri, F.: Lexical bundles in university spoken and written registers. Engl. Specif. Purp. 26, 263–286 (2007)
Wermter, J., Hahn, U.: You can’t beat frequency (unless you use linguistic knowledge) - a qualitative evaluation of association measures for collocation and term extraction. In: Proceedings of COLING-ACL 2006, Sydney, ACL, pp. 785–792 (2006)
Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eva. 44, 137–158 (2010)
Hyland, K.: As can be seen: lexical bundles and disciplinary variation. Engl. Specif. Purp. 27, 4–21 (2008)
Biber, D., Conrad, S., Cortes, V.: If you look at..: lexical bundles in university teaching and textbooks. Appl. Linguist. 25(3), 371–405 (2004)
Wray, A.: Formulaic Language and the Lexicon. Cambridge University Press, Cambridge (2006)
McEnery, T., Xiao, R., Tono, Y.: Corpus-Based Language Studies: An Advanced Resource Book. Routledge, London (2006)
Lindberg, I., Kokkinakis, S.J.: OrdiL - en korpusbaserad kartläggning av ordförrådet i läromedel för grundskolans senare år. Göteborgs universitet, Göteborg (2007)
Ribeck, J.C.: Identifying lexical bundles in secondary school textbooks. In: Vetualani, X. (ed.) Proceedings of the 5th Language and Technology Conference on Human Language Technologies as a Challenge for Computer Science and Linguists, Poznan, Poland, pp. 202–206 (2011)
Borin, L., Östling, R., Ribeck, J., Wirén, M.: Towards unsupervised extraction of syntactico-semantic patterns (in progress)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, p. 1. Springer, Heidelberg (2002)
Villaviciencio, A., Bond, F., Korhonen, A., McCarthy, D.: Introduction to the special issue on multiword expressions: having a crack at a hard nut. Comput. Speech Lang. 19, 365–377 (2005)
Rayson, P., Piao, S., Sharoff, S., Evert, S.: Villada Moirón, B.: Multiword expressions: hard going or plain sailing? Lang. Resour. Eval. 44, 1–5 (2010)
Borin, L., Danélls, D., Forsberg, M., Kokkinakis, D., Gronostaj, M.T.: The past meets the present in Swedish FrameNet++. In: 14th EURALEX International Congress, Leeuwarden, EURALEX, pp. 269–281 (2010)
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the Conference on COLING-ACL ’98, Montreal, ACL, pp. 86–90 (1998)
Borin, L., Forsberg, M., Lönngren, L.: SALDO: a touch of yin to WordNet’s yang. Lang. Resour. Eval. (2013). doi:10.1007/s10579-013-9233-4
Borin, L., Forsberg, M., Roxendal, J.: Korp - the corpus infrastructure of Språkbanken. In: Proceedings of LREC 2012, Istanbul, ELRA, pp. 474–478 (2012)
Hewlett, D., Cohen, P.: Word segmentation as general chunking. In: Proceedings of CoNLL 2011, Portland, Oregon, ACL, pp. 39–47 (2011)
Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)
Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proceedings of EMNLP 2001, Pittsburgh, ACL (2001)
Wible, D., Tsao, N.L.: Stringnet as a computational resource for discovering and investigating linguistic constructions. In: Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, Los Angeles, California, ACL, pp. 25–31 (2010)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Baldwin, T., Kim, S.N.: Multiword expressions. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, pp. 267–292. CRC Press, Boca Raton (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ribeck, J., Borin, L. (2014). Lexical Bundles in Swedish Secondary School Textbooks. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-08958-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)