Skip to main content

An Efficient Chinese Word Segmentation Algorithm for Chinese Information Processing on the Internet

  • Conference paper
Internet Applications (ICSC 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1749))

Included in the following conference series:

  • 375 Accesses

Abstract

A Chinese word segmentation algorithm based on forward maximum matching and word binding force is proposed in this paper. To support this algorithm, a text corpus of over 63 millions characters is employed to enrich an 80,000-words lexicon in terms of its word entries and word binding forces. As it stands now, given an input line of text, the word segmentor can process on the average 210,000 characters per second when running on an IBM RISC System/6000 3BT workstation with a correct word identification rate of 99.74%. The proposed word segmentation algorithm can be applied to process the huge amount of Chinese information on the Internet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Devore, J.L.: Probability and statistics for engineering and sciences, pp. 272–276. Duxbury Press, Boston (1991)

    Google Scholar 

  2. Liu, Y., Tan, Q., Shen, K.X.: The word segmentation rules and automatic word segmentation methods for Chinese information processing (in Chinese), vol. 36. Qing Hua University Press and Guang Xi Science and Technology Press (1994)

    Google Scholar 

  3. Lua, K.-T., Gan, K.-W.: An application of information theory in Chinese word segmentation. Computer Processing of Chinese and Oriental Languages 8(1), 115–123 (1994)

    Google Scholar 

  4. Lua, K.T.: ¿From character to word – an application of information theory. Computer Processing of Chinese and Oriental Languages 4(4), 304–313 (1990)

    Google Scholar 

  5. Sproat, R., Shih, C.: Asta tistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages 4(4), 336–349 (1990)

    Google Scholar 

  6. Wang, L.-J., Pei, T., Li, W.-C., Huang, L.-C.R.: Ap arsing method for identifying words in mandarin Chinese sentences. In: Processings of 12th International Joint Conference on Artificial Intelligence, Sydney, Australia, pp. 1018–1023 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wong, P.K. (1999). An Efficient Chinese Word Segmentation Algorithm for Chinese Information Processing on the Internet. In: Hui, L.C.K., Lee, DL. (eds) Internet Applications. ICSC 1999. Lecture Notes in Computer Science, vol 1749. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-46652-9_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-46652-9_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66903-6

  • Online ISBN: 978-3-540-46652-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics