Skip to main content

Latent Semantic Distance Between Chinese Basic Words and Non-basic Words

  • Conference paper
  • First Online:
Book cover Chinese Lexical Semantics (CLSW 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8922))

Included in the following conference series:

  • 1773 Accesses

Abstract

What determines the “basicness” of words still remains a challenging question in creating basic lexicons and basic wordlists. Since frequency and dispersion seem to be the most dominant criteria, it is questioned that whether contextual factors also help to define the concept of “basicness.” From the perspective of the distributional model, meanings are represented through the interaction between words and their contexts. Hence, this research aims to examine an existing wordlist and tentatively take it as the standard of “basicness,” trying to seek the differences between “basic words” and “non-basic words” based on their occurrences in different texts. Two experiments were conducted to answer the research questions. The first calculated the “latent semantic distances” between basic words and non-basic words. The second calculated and examined the “near neighbors” of basic word and non-basic words. It has been discovered that basic words tend to occur in more similar texts than non-basic words do; in addition, the near neighbors of basic words tend to be more “basic”, too. This research contributes to providing a more “contextual” perspective in exploring “basicness.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barsalou, L.W.: Are there static category representations in long-term memory? Behavioral and Brain Sciences 9(04), 651–652 (1986)

    Article  Google Scholar 

  2. Bauman, J., Culligan, B.: General service list. ms (1995). http://www.jbauman/gsl.html

  3. Brezina, V., Gablasova, D.: Is There a Core General Vocabulary? Introducing the New General Service List. Applied Linguistics (2013)

    Google Scholar 

  4. Chang, L.: A preliminary approach to grading vocabulary and patterns of Chinese as a second language. National Science Council Research Report (NS-92-2411-H-003 -045) (2004)

    Google Scholar 

  5. Chang, L., Chen, F.: hua2yu3 ci2hui4 fen1ji2 chu1tan4. (On Chinese Lexicon Levels). The Sixth Chinese Lexical Semantics Workshop Proceedings (2005). (in Chinese)

    Google Scholar 

  6. Chang, L.: dui4ying4 yu2 ou1zhou gong4tong2 jia4gou4 de hua2yu3 ci2hui4 liang4 (On Structuring Chinese Lexicon). Chinese Teaching Research 9(2), 77–96 (2012). (in Chinese)

    Google Scholar 

  7. Chen, M.-L., et al.: The construction and validation of Chinese semantic space by using latent semantic analysis. Chinese Journal of Psychology 51(4), 415–435 (2009)

    Google Scholar 

  8. Coxhead, A.: A new academic word list. TESOL Quarterly 34(2), 213–238 (2000)

    Article  Google Scholar 

  9. Dumais, S.T.: Latent semantic analysis. Annual Review of Information Science and Technology 38(1), 188–230 (2004)

    Article  Google Scholar 

  10. Firth, J.R.: Studies in linguistic analysis. Basil Blackwell, Oxford (1957)

    Google Scholar 

  11. Wild, F.: lsa: Latent Semantic Analysis. R package version 0.63-3 (2011). http://CRAN.R-project.org/package=lsa

  12. Harris, Z.S.: Distributional structure. Springer (1981)

    Google Scholar 

  13. Kintsch, W., Mangalath, P.: The construction of meaning. Topics in Cognitive Science 3(2), 346–370 (2011)

    Article  Google Scholar 

  14. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  15. National Chinese Teaching Leading Group: Han4yu3 shui3ping2 kao3ci4 hui4yu4 han4zi4 deng3ji2 da4gang1 (The Introduction of Chinese Language Test). Beijing Linguistics Department Published, Beijing (1992). (in Chinese)

    Google Scholar 

  16. Quesada, J.: Creating your own LSA spaces. In: Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis, pp. 71–88. Mahwah, Erlbaum (2007)

    Google Scholar 

  17. Sinica Academia Corpus Study Group: Lexical Frequency Dictionary. Sinica Academia Chinese Database Group Technical Report CKIP-98-01. Sinica Academia, Taipei (1998a)

    Google Scholar 

  18. Turney, D.P., Pantel, P.: From frequency to meanings: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–188

    Google Scholar 

  19. Vossen, P., et al.: The eurowordnet base concepts and top ontology. Deliverable D-017, D-34, D-036 (1998)

    Google Scholar 

  20. Vossen, P. (ed.): EuroWordNet general document. Technical report. University of Amsterdam, Amsterdam, Version 3, Final, July 1 2002

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shanon Yi-Hsin Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Lin, S.YH., Hsieh, SK. (2014). Latent Semantic Distance Between Chinese Basic Words and Non-basic Words. In: Su, X., He, T. (eds) Chinese Lexical Semantics. CLSW 2014. Lecture Notes in Computer Science(), vol 8922. Springer, Cham. https://doi.org/10.1007/978-3-319-14331-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14331-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14330-9

  • Online ISBN: 978-3-319-14331-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics