skip to main content
10.1145/3393527.3393540acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article

Multi-Domain Global Correlation Degree Branching Entropy Method for Microblog Text Word Segmentation

Authors Info & Claims
Published:26 October 2020Publication History

ABSTRACT

Word segmentation is a basic topic in the field of natural language processing, and improving the accuracy of word segmentation is a key problem. With the popularity of microblog, accurate word segmentation for microblog text has become a hot spot. However, microblog texts often contain information about multiple related domains, ambiguous words in multi-domain will lead to the decline of word segmentation accuracy. Based on the model theory of word vector and branching entropy, this paper proposes a multi-domain global correlation degree branching entropy method for microblog text word segmentation. This model is applied to microblog text about house price topic in Beijing. The precision, recall and F-measure of this method are compared with branching entropy model proposed by Zhang[6], and the experimental results show that our method outperforms it.

References

  1. Wei Yang and Longshu Li. 2014. Research on Film box office forecasting Model based on Weibo data. Electronic World, 21(Nov, 2014), 13--16.Google ScholarGoogle Scholar
  2. Wenqing Zhao, Xiaoke Hou and Haihong Sha. 2014. Application of semantic rules to sentiment analysis of microblog hot topics. CAAI Transactions on Intelligent Systems, 9(2014), 121--125Google ScholarGoogle Scholar
  3. Dongxia Zhang. 2013. Analysis and Discovery of Network Hotspot Based On Microblog for College Students. Southeast Communication, 6, (2013), 87--89.Google ScholarGoogle Scholar
  4. Nianwen Xue. 2003. Chinese Word Segmentation as Character Tagging. Computational Linguistics and Chinese Language Processing, 8, (23, 2003), 29--48Google ScholarGoogle Scholar
  5. Sujian Li, Qun Liu and Zhiyong Zhang. 2002. Method of Maximum Entropy Model for Language Processing. Computer Science, 7, (29, 2002), 108--110.Google ScholarGoogle Scholar
  6. Libang Zhang, Yi Yuan, Jinfeng Yang. 2014. An Unsupervised Approach to Word Segmentation in Chinese EMRs. Intelligent Computer and Applications. 2, (Jan, 2014), 68--71.Google ScholarGoogle Scholar
  7. Fuchun Peng, Fangfang Feng, and Andrew McCallum. 2004. Chinese Segmentation and New Word Detection Using Conditional Random Fields. In Proceedings of International Conference on Computational Linguistics, 2004, 562--568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tao Tang and Qiaoli Zhou. 2011. Term extraction based on the combination of Statistics and rules. Journal of Shenyang Aerospace University, 5, (28, 2011), 71--74.Google ScholarGoogle Scholar
  9. Yang Zheng, Jianwen Mo. 2012. Chinese word Segmentation method based on Professional term extraction. Popular Science & Technology, 4, (14, 2012), 20--23.Google ScholarGoogle Scholar
  10. Keda He, Zhengtao Zhu and Yu Cheng. 2016. Research on Text Categorization Based on Improved TF-IDF Algorithm. Journal of GUANGDONG university of Techonology, 5, (33, 2016), 49--53.Google ScholarGoogle Scholar

Index Terms

  1. Multi-Domain Global Correlation Degree Branching Entropy Method for Microblog Text Word Segmentation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ACM TURC '20: Proceedings of the ACM Turing Celebration Conference - China
        May 2020
        220 pages
        ISBN:9781450375344
        DOI:10.1145/3393527

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader