Research of an Improved Algorithm for Chinese Word Segmentation Dictionary Based on Double-Array Trie Tree

Yang, Wenchuan; Liu, Jian; Yu, Miao

doi:10.1007/978-3-642-41644-6_33

Wenchuan Yang⁴,
Jian Liu⁴ &
Miao Yu⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 400))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1824 Accesses

Abstract

Chinese word segmentation dictionary based on the Double-Array Trie Tree has higher efficiency of search, but the dynamic insertion will consume a lot of time. This paper presents an improved algorithm-iDAT, which is based on Double-Array Trie Tree for Chinese Word Segmentation Dictionary. After initialization the original dictionary. We implement a Hash process to the empty sequence index values for base array. The final Hash table stores the sum of the empty sequence before the current empty sequence. This algorithm adopt Sunday jumps algorithm of Single Pattern Matching. With slightly and reasonable space cost increasing, iDAT reduces the average time complexity of the dynamic insertion process in Trie Tree. Practical results shows it has a good operation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Huang, C.N.: A review of ten years of Chinese word segmentation. Journal of Chinese Information 147, 195–199 (2007)
Google Scholar
Zhao, H.Y.: A study on Chinese word segmentation based on Double-Array Trie Tree. Journal of Hunan University 22, 322–329 (2009)
Google Scholar
Zhao, C.Y.: A word segmentation method based on the word. Journal of Soochow University 18, 44–48 (2002)
Google Scholar
Chen, G.L.: An improved fast segmentation algorithm. Journal of Computer Research and Development 37, 418–424 (2009)
Google Scholar
Li, Z., Xu, Z., Tang, W.: A full two points maximum matching in Computer Engineering and application of fast segmentation algorithm. Journal of Computer Science 38, 102–108 (2005)
Google Scholar
Li, J.: A fast algorithm for query Chinese dictionary. Journal of Chinese Information 137, 97–101 (2006)
Google Scholar
Wang, S.: Research on Double-Array Trie Tree algorithm optimization and its application. Journal of Chinese Information 138, 131–137 (2006)
Google Scholar
Wang, S., Li, Z., Ke, X.: Based on improved genetic algorithm and Sherwood thought the Double-Array Trie Tree. Journal of Computer Engineering 78, 231–236 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunication, Beijing, 100876, China
Wenchuan Yang, Jian Liu & Miao Yu

Authors

Wenchuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Miao Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Soochow University, 1 Shizi Street, 215006, Suzhou, China
Guodong Zhou
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Juanzi Li
Institute of Computer Science & Technology, Peking University, 100871, Beijing, China
Dongyan Zhao & Yansong Feng &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, W., Liu, J., Yu, M. (2013). Research of an Improved Algorithm for Chinese Word Segmentation Dictionary Based on Double-Array Trie Tree. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-41644-6_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41643-9
Online ISBN: 978-3-642-41644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics