An Efficient Chinese Word Segmentation Algorithm for Chinese Information Processing on the Internet

Wong, P. K.

doi:10.1007/978-3-540-46652-9_47

P. K. Wong⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1749))

Included in the following conference series:

International Computer Science Conference

Abstract

A Chinese word segmentation algorithm based on forward maximum matching and word binding force is proposed in this paper. To support this algorithm, a text corpus of over 63 millions characters is employed to enrich an 80,000-words lexicon in terms of its word entries and word binding forces. As it stands now, given an input line of text, the word segmentor can process on the average 210,000 characters per second when running on an IBM RISC System/6000 3BT workstation with a correct word identification rate of 99.74%. The proposed word segmentation algorithm can be applied to process the huge amount of Chinese information on the Internet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Towards Better Text Processing Tools for the Ainu Language

TLex+: A Hybrid Method Using Conditional Random Fields and Dictionaries for Thai Word Segmentation

Segmentation of Handwritten Sanskrit Words Using Image-Processing Techniques

References

Devore, J.L.: Probability and statistics for engineering and sciences, pp. 272–276. Duxbury Press, Boston (1991)
Google Scholar
Liu, Y., Tan, Q., Shen, K.X.: The word segmentation rules and automatic word segmentation methods for Chinese information processing (in Chinese), vol. 36. Qing Hua University Press and Guang Xi Science and Technology Press (1994)
Google Scholar
Lua, K.-T., Gan, K.-W.: An application of information theory in Chinese word segmentation. Computer Processing of Chinese and Oriental Languages 8(1), 115–123 (1994)
Google Scholar
Lua, K.T.: ¿From character to word – an application of information theory. Computer Processing of Chinese and Oriental Languages 4(4), 304–313 (1990)
Google Scholar
Sproat, R., Shih, C.: Asta tistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages 4(4), 336–349 (1990)
Google Scholar
Wang, L.-J., Pei, T., Li, W.-C., Huang, L.-C.R.: Ap arsing method for identifying words in mandarin Chinese sentences. In: Processings of 12th International Joint Conference on Artificial Intelligence, Sydney, Australia, pp. 1018–1023 (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Studies, Hong Kong Institute of Vocational Education (Sha Tin) Hong Kong Vocational Training Council, Sha Tin, N.T., Hong Kong
P. K. Wong

Authors

P. K. Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong, China
Lucas Chi Kwong Hui
Department of Computer Science, Hong Kong University of Science and Technology,
Dik-Lun Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wong, P.K. (1999). An Efficient Chinese Word Segmentation Algorithm for Chinese Information Processing on the Internet. In: Hui, L.C.K., Lee, DL. (eds) Internet Applications. ICSC 1999. Lecture Notes in Computer Science, vol 1749. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-46652-9_47

Download citation

DOI: https://doi.org/10.1007/978-3-540-46652-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66903-6
Online ISBN: 978-3-540-46652-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics