Skip to main content

Lexical Bundles in Swedish Secondary School Textbooks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Abstract

The present paper describes the process of identifying lexical bundles, i.e., frequently recurring word sequences such as by means of and in the end of, in secondary school history and physics textbooks. In its determination of finding genuine lexical bundles, i.e. the word boundaries between lexical bundles and surrounding arbitrary words, it proposes a new approach to come to terms with the problem of extracting overlapping bundles of different lengths. The results of the structural classification indicate that history uses more NP/PP-based and less dependent-clause-based bundles than physics. The comparative analysis manages to restrict this difference to the referential function. History almost only refers to phrases, i.e. within clauses, while physics much more tends to make references across clauses. The article also includes a report on an extension of the study, ongoing work where the automatic identification of multi-word expressions in general is in focus.

The research reported here was supported in part by the Swedish Research Council (through the project Swedish FrameNet++, VR dnr 2010–6013), and by the University of Gothenburg (through its support of the Centre for Language Technology and Språkbanken/the Swedish Language Bank).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This is a bit like attempting to discover the words in un-word segmented text by looking at the frequency of, e.g., four-character sequences, which seems to be an exercise of doubtful value.

  2. 2.

    Although it is not very obvious what can be concluded about the language system or the mental lexicon of the language user from the attested high text frequency of a sequence like in the case of the cited by Biber and Conrad [11]. See Sect. 2.1.

  3. 3.

    http://www.antlab.sci.waseda.ac.jpl

  4. 4.

    A normalisation gives 30 and 50 times/MW for physics and history respectively.

  5. 5.

    http://spraakbanken.gu.se/eng/resource/saldo

References

  1. Cummins, J.: The entry and fallacy in bilingual education. NABE J. 4(3), 25–29 (1980)

    Google Scholar 

  2. Macken-Horarik, M.: Literacy and learning across the curriculum: towards a model of register for secondary school teachers. In: Hasan, R., Williams, G. (eds.) Literacy in Society, pp. 232–279. Longman, London (1996)

    Google Scholar 

  3. Halliday, M.A.K., Martin, J.: Writing Science: Literacy and Discursive Power. Falmer Press, London (1993)

    Google Scholar 

  4. Gardner, D.: Validating the construct of word in applied corpus-based vocabulary research: a critical survey. Appl. Linguist. 28(2), 241–265 (2007)

    Article  Google Scholar 

  5. Cortes, V.: Lexical bundles in published and student disciplinary writing: examples from history and biology. Engl. Specif. Purp. 23, 397–423 (2004)

    Article  Google Scholar 

  6. Strunkyte, G., Jurkunaite, E.: Written Academic Discourse: Lexical Bundles in Humanities and Natural Sciences. Vilnius University, Vilnius (2008)

    Google Scholar 

  7. Jackendoff, R.: The Architecture of the Language Faculty. MIT Press, Cambridge (1997)

    Google Scholar 

  8. Atkins, S.B.T., Rundell, M.: Guide to Practical Lexicography. Oxford University Press, London (2008)

    Google Scholar 

  9. Chen, Y.H., Baker, P.: Lexical bundles in L1 and L2 writing. Lang. Learn. Technol. 14(2), 30–49 (2010)

    Google Scholar 

  10. Nekrasova, T.M.: English L1 and L2 speakers’ knowledge of lexical bundles. Lang. Learn. 59(3), 647–486 (2009)

    Article  Google Scholar 

  11. Biber, D., Conrad, S.: Lexical bundles in conversation and academic prose. In: Hasselgard, H., Oksefjell, S. (eds.) Out of Corpora: Studies in Honor of Stig Johansson, pp. 181–189. Rodopi, Amsterdam (1999)

    Google Scholar 

  12. Biber, D., Barbieri, F.: Lexical bundles in university spoken and written registers. Engl. Specif. Purp. 26, 263–286 (2007)

    Article  Google Scholar 

  13. Wermter, J., Hahn, U.: You can’t beat frequency (unless you use linguistic knowledge) - a qualitative evaluation of association measures for collocation and term extraction. In: Proceedings of COLING-ACL 2006, Sydney, ACL, pp. 785–792 (2006)

    Google Scholar 

  14. Pecina, P.: Lexical association measures and collocation extraction. Lang. Resour. Eva. 44, 137–158 (2010)

    Article  Google Scholar 

  15. Hyland, K.: As can be seen: lexical bundles and disciplinary variation. Engl. Specif. Purp. 27, 4–21 (2008)

    Article  Google Scholar 

  16. Biber, D., Conrad, S., Cortes, V.: If you look at..: lexical bundles in university teaching and textbooks. Appl. Linguist. 25(3), 371–405 (2004)

    Article  Google Scholar 

  17. Wray, A.: Formulaic Language and the Lexicon. Cambridge University Press, Cambridge (2006)

    Google Scholar 

  18. McEnery, T., Xiao, R., Tono, Y.: Corpus-Based Language Studies: An Advanced Resource Book. Routledge, London (2006)

    Google Scholar 

  19. Lindberg, I., Kokkinakis, S.J.: OrdiL - en korpusbaserad kartläggning av ordförrådet i läromedel för grundskolans senare år. Göteborgs universitet, Göteborg (2007)

    Google Scholar 

  20. Ribeck, J.C.: Identifying lexical bundles in secondary school textbooks. In: Vetualani, X. (ed.) Proceedings of the 5th Language and Technology Conference on Human Language Technologies as a Challenge for Computer Science and Linguists, Poznan, Poland, pp. 202–206 (2011)

    Google Scholar 

  21. Borin, L., Östling, R., Ribeck, J., Wirén, M.: Towards unsupervised extraction of syntactico-semantic patterns (in progress)

    Google Scholar 

  22. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, p. 1. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  23. Villaviciencio, A., Bond, F., Korhonen, A., McCarthy, D.: Introduction to the special issue on multiword expressions: having a crack at a hard nut. Comput. Speech Lang. 19, 365–377 (2005)

    Article  Google Scholar 

  24. Rayson, P., Piao, S., Sharoff, S., Evert, S.: Villada Moirón, B.: Multiword expressions: hard going or plain sailing? Lang. Resour. Eval. 44, 1–5 (2010)

    Article  Google Scholar 

  25. Borin, L., Danélls, D., Forsberg, M., Kokkinakis, D., Gronostaj, M.T.: The past meets the present in Swedish FrameNet++. In: 14th EURALEX International Congress, Leeuwarden, EURALEX, pp. 269–281 (2010)

    Google Scholar 

  26. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of the Conference on COLING-ACL ’98, Montreal, ACL, pp. 86–90 (1998)

    Google Scholar 

  27. Borin, L., Forsberg, M., Lönngren, L.: SALDO: a touch of yin to WordNet’s yang. Lang. Resour. Eval. (2013). doi:10.1007/s10579-013-9233-4

  28. Borin, L., Forsberg, M., Roxendal, J.: Korp - the corpus infrastructure of Språkbanken. In: Proceedings of LREC 2012, Istanbul, ELRA, pp. 474–478 (2012)

    Google Scholar 

  29. Hewlett, D., Cohen, P.: Word segmentation as general chunking. In: Proceedings of CoNLL 2011, Portland, Oregon, ACL, pp. 39–47 (2011)

    Google Scholar 

  30. Hammarström, H., Borin, L.: Unsupervised learning of morphology. Comput. Linguist. 37(2), 309–350 (2011)

    Article  Google Scholar 

  31. Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proceedings of EMNLP 2001, Pittsburgh, ACL (2001)

    Google Scholar 

  32. Wible, D., Tsao, N.L.: Stringnet as a computational resource for discovering and investigating linguistic constructions. In: Proceedings of the NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, Los Angeles, California, ACL, pp. 25–31 (2010)

    Google Scholar 

  33. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  34. Baldwin, T., Kim, S.N.: Multiword expressions. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, pp. 267–292. CRC Press, Boca Raton (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Judy Ribeck .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ribeck, J., Borin, L. (2014). Lexical Bundles in Swedish Secondary School Textbooks. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics