skip to main content
10.1145/3573428.3573542acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

A Chinese word segmentation method based on dictionary and HMM

Published:15 March 2023Publication History

ABSTRACT

Aiming at the problems of ambiguity segmentation and low success rate of new words discovery in Chinese word segmentation, this paper proposes a Chinese word segmentation method based on dictionary and Hidden Markov Model. Through forward maximum matching algorithm and backward maximum matching algorithm, the coarse segmentation results are obtained, and the ambiguous fragments are collected and input into the Hidden Markov model. The Hidden Markov Model performs secondary word segmentation through word order tagging and identifies new words, and adds new words to the dictionary to improve the dictionary. The experimental results show that the proposed algorithm improves the problem of low success rate of ambiguity recognition and new word discovery, improves the accuracy, recall and F1 value of ordinary text segmentation, and improves the problem that Jieba segmentation ability decreases in professional text.

References

  1. GONG F H, ZHU P H. Word segmentation Based on Adaptive Hidden Markov Model in Oil field [J]. COMPUTER SCIENCE, 2018, 45(S1): 97-100.Google ScholarGoogle Scholar
  2. JIANG W L, CHEN Z H, SHAO D G. Dynamic programming word segmentation algorithm based on domain dictionaries [J]. Journal of Nanjing University of Science and Technology, 2019, 43(1): 63-71.Google ScholarGoogle Scholar
  3. YUAN Y, PENG J H, ZHANG R Y. Study on Chinese Word Sense Disambiguation Based on Statistics [J]. JOURNAL OF INFORMATION ENGINEERING UNIVERSITY, 2007, 8(4): 501-504.Google ScholarGoogle Scholar
  4. LIU Y, WEI G Z. Improvement on maximum matching method mechanism based on double character Hash indexing [J]. Electronic Design Engineering, 2017, 25(16): 11-15.Google ScholarGoogle Scholar
  5. DU L P, LI X G, YU G. New Word Detection Based on an Improved PMI Algorithm for Enhancing Segmentation System [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(1): 35-40.Google ScholarGoogle Scholar
  6. ZHAO Z Q, CHEN Z Y, LIU J B, Chinese named entity recognition in power domain based on Bi-LSTM-CRF [C] //International Conference on Artificial Intelligence and Pattern Recognition. Beijing: AIPR, 2019: 176-180. DOI: 10.1145/3357254.3357283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. XU C W, WANG F Y, HAN J L, Exploiting multiple embedding for Chinese named entity recognition [C] //Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: Association for Computing Machinery, 2019: 2269-2272.Google ScholarGoogle Scholar
  8. Zhang Q, Liu X Y, Fu J L. Neural networks incorporating dictionaries for Chinese word segmentation [C] //Proceedings of the Thirty- Second AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 5682-5689.Google ScholarGoogle Scholar
  9. WU Y F, WEI X, QIN Y B, A radical-based method for Chinese named entity recognition [C] //International Conference on Big Data. Los Angeles: IEEE, 2019: 125-130.Google ScholarGoogle Scholar
  10. YANG F, ZHANG J H, LIU G S, Five-strokebased CNN-Bi RNN-CRF network for Chinese named entity recognition [C]//CCF International Conference on Natural Language Processing and Chinese Computing. Hohhot China Computer Federation, 2018: 184-195.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
    October 2022
    1999 pages
    ISBN:9781450397148
    DOI:10.1145/3573428

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 March 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate508of972submissions,52%
  • Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)2

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format