ABSTRACT
Word segmentation is a basic topic in the field of natural language processing, and improving the accuracy of word segmentation is a key problem. With the popularity of microblog, accurate word segmentation for microblog text has become a hot spot. However, microblog texts often contain information about multiple related domains, ambiguous words in multi-domain will lead to the decline of word segmentation accuracy. Based on the model theory of word vector and branching entropy, this paper proposes a multi-domain global correlation degree branching entropy method for microblog text word segmentation. This model is applied to microblog text about house price topic in Beijing. The precision, recall and F-measure of this method are compared with branching entropy model proposed by Zhang[6], and the experimental results show that our method outperforms it.
- Wei Yang and Longshu Li. 2014. Research on Film box office forecasting Model based on Weibo data. Electronic World, 21(Nov, 2014), 13--16.Google Scholar
- Wenqing Zhao, Xiaoke Hou and Haihong Sha. 2014. Application of semantic rules to sentiment analysis of microblog hot topics. CAAI Transactions on Intelligent Systems, 9(2014), 121--125Google Scholar
- Dongxia Zhang. 2013. Analysis and Discovery of Network Hotspot Based On Microblog for College Students. Southeast Communication, 6, (2013), 87--89.Google Scholar
- Nianwen Xue. 2003. Chinese Word Segmentation as Character Tagging. Computational Linguistics and Chinese Language Processing, 8, (23, 2003), 29--48Google Scholar
- Sujian Li, Qun Liu and Zhiyong Zhang. 2002. Method of Maximum Entropy Model for Language Processing. Computer Science, 7, (29, 2002), 108--110.Google Scholar
- Libang Zhang, Yi Yuan, Jinfeng Yang. 2014. An Unsupervised Approach to Word Segmentation in Chinese EMRs. Intelligent Computer and Applications. 2, (Jan, 2014), 68--71.Google Scholar
- Fuchun Peng, Fangfang Feng, and Andrew McCallum. 2004. Chinese Segmentation and New Word Detection Using Conditional Random Fields. In Proceedings of International Conference on Computational Linguistics, 2004, 562--568.Google ScholarDigital Library
- Tao Tang and Qiaoli Zhou. 2011. Term extraction based on the combination of Statistics and rules. Journal of Shenyang Aerospace University, 5, (28, 2011), 71--74.Google Scholar
- Yang Zheng, Jianwen Mo. 2012. Chinese word Segmentation method based on Professional term extraction. Popular Science & Technology, 4, (14, 2012), 20--23.Google Scholar
- Keda He, Zhengtao Zhu and Yu Cheng. 2016. Research on Text Categorization Based on Improved TF-IDF Algorithm. Journal of GUANGDONG university of Techonology, 5, (33, 2016), 49--53.Google Scholar
Index Terms
- Multi-Domain Global Correlation Degree Branching Entropy Method for Microblog Text Word Segmentation
Recommendations
Chinese Word Segmentation Based on Maximum Entropy
RSVT '19: Proceedings of the 2019 International Conference on Robotics Systems and Vehicle TechnologyChinese word segmentation has received extensive attention in recent years. The word segmentation method based on character-based tagging improves the performance of word segmentation greatly. This method transforms the word segmentation problem into a ...
Domain Neural Chinese Word Segmentation with Mutual Information and Entropy
ICIT '19: Proceedings of the 2019 7th International Conference on Information Technology: IoT and Smart CityChinese word segmentation (CWS) is an important basic task for NLP. However, the word segmentation model trained by the generic domain corpus has a significant decline in performance in the word segmentation task oriented to the specific domain. Aiming ...
Two-Word Collocation Extraction Using Monolingual Word Alignment Method
Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Comments