Abstract
Chinese word segmentation dictionary based on the Double-Array Trie Tree has higher efficiency of search, but the dynamic insertion will consume a lot of time. This paper presents an improved algorithm-iDAT, which is based on Double-Array Trie Tree for Chinese Word Segmentation Dictionary. After initialization the original dictionary. We implement a Hash process to the empty sequence index values for base array. The final Hash table stores the sum of the empty sequence before the current empty sequence. This algorithm adopt Sunday jumps algorithm of Single Pattern Matching. With slightly and reasonable space cost increasing, iDAT reduces the average time complexity of the dynamic insertion process in Trie Tree. Practical results shows it has a good operation performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Huang, C.N.: A review of ten years of Chinese word segmentation. Journal of Chinese Information 147, 195–199 (2007)
Zhao, H.Y.: A study on Chinese word segmentation based on Double-Array Trie Tree. Journal of Hunan University 22, 322–329 (2009)
Zhao, C.Y.: A word segmentation method based on the word. Journal of Soochow University 18, 44–48 (2002)
Chen, G.L.: An improved fast segmentation algorithm. Journal of Computer Research and Development 37, 418–424 (2009)
Li, Z., Xu, Z., Tang, W.: A full two points maximum matching in Computer Engineering and application of fast segmentation algorithm. Journal of Computer Science 38, 102–108 (2005)
Li, J.: A fast algorithm for query Chinese dictionary. Journal of Chinese Information 137, 97–101 (2006)
Wang, S.: Research on Double-Array Trie Tree algorithm optimization and its application. Journal of Chinese Information 138, 131–137 (2006)
Wang, S., Li, Z., Ke, X.: Based on improved genetic algorithm and Sherwood thought the Double-Array Trie Tree. Journal of Computer Engineering 78, 231–236 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, W., Liu, J., Yu, M. (2013). Research of an Improved Algorithm for Chinese Word Segmentation Dictionary Based on Double-Array Trie Tree. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-41644-6_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41643-9
Online ISBN: 978-3-642-41644-6
eBook Packages: Computer ScienceComputer Science (R0)