Skip to main content

Research of an Improved Algorithm for Chinese Word Segmentation Dictionary Based on Double-Array Trie Tree

  • Conference paper
Natural Language Processing and Chinese Computing (NLPCC 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 400))

  • 1824 Accesses

Abstract

Chinese word segmentation dictionary based on the Double-Array Trie Tree has higher efficiency of search, but the dynamic insertion will consume a lot of time. This paper presents an improved algorithm-iDAT, which is based on Double-Array Trie Tree for Chinese Word Segmentation Dictionary. After initialization the original dictionary. We implement a Hash process to the empty sequence index values for base array. The final Hash table stores the sum of the empty sequence before the current empty sequence. This algorithm adopt Sunday jumps algorithm of Single Pattern Matching. With slightly and reasonable space cost increasing, iDAT reduces the average time complexity of the dynamic insertion process in Trie Tree. Practical results shows it has a good operation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Huang, C.N.: A review of ten years of Chinese word segmentation. Journal of Chinese Information 147, 195–199 (2007)

    Google Scholar 

  2. Zhao, H.Y.: A study on Chinese word segmentation based on Double-Array Trie Tree. Journal of Hunan University 22, 322–329 (2009)

    Google Scholar 

  3. Zhao, C.Y.: A word segmentation method based on the word. Journal of Soochow University 18, 44–48 (2002)

    Google Scholar 

  4. Chen, G.L.: An improved fast segmentation algorithm. Journal of Computer Research and Development 37, 418–424 (2009)

    Google Scholar 

  5. Li, Z., Xu, Z., Tang, W.: A full two points maximum matching in Computer Engineering and application of fast segmentation algorithm. Journal of Computer Science 38, 102–108 (2005)

    Google Scholar 

  6. Li, J.: A fast algorithm for query Chinese dictionary. Journal of Chinese Information 137, 97–101 (2006)

    Google Scholar 

  7. Wang, S.: Research on Double-Array Trie Tree algorithm optimization and its application. Journal of Chinese Information 138, 131–137 (2006)

    Google Scholar 

  8. Wang, S., Li, Z., Ke, X.: Based on improved genetic algorithm and Sherwood thought the Double-Array Trie Tree. Journal of Computer Engineering 78, 231–236 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, W., Liu, J., Yu, M. (2013). Research of an Improved Algorithm for Chinese Word Segmentation Dictionary Based on Double-Array Trie Tree. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41644-6_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41643-9

  • Online ISBN: 978-3-642-41644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics