Abstract
Chinese word segmentation is an important research direction in related research on elementary mathematics knowledge extraction. The speed of segmentation directly affects subsequent applications, and the accuracy of segmentation directly affects corresponding research in the next step. In the machine learning methods for extracting basic mathematical knowledge points, the Conditional Random Field (CRF) model implements new word discovery well, and is increasingly used in knowledge extraction of basic mathematics. This article first introduces the traditional CRF process of named entity recognition. Then, an improved algorithm CRF++for conditional field model is proposed. Since the recognition rate of named entities based on traditional machine learning methods is not high, a post-processing method for entity recognition that automatically generates a dictionary is proposed. After identifying mathematical entities, a pruning strategy combining Viterbi algorithm and rules is proposed to achieve a higher recognition rate of elementary mathematical entities. Finally, several methods of disambiguation after entity recognition are introduced.
References
Peng P (2019) Natural language processing — Chinese vectorization short text book quantitative study [D]. Central China Normal University, Hubei, p 14–18
Asperti A, Padovani L, Coen CS, Guidi F, Schena I (2003) Mathematical knowledge management in HELM. Ann Math Artif Intel 38(1–3):27–46
Yang D, Yang D, Hang G, Daocheng H, Gao M, Wang Y (2019) Research on knowledge point relation extraction for elementary mathematics [J]. J East China Normal Univ 05:53–65
Zhu H, Yang L, Wenxue D, Jiamei F (2018) Chinese micro blog named entity recognition based on subject tag and CRF. J Cen China Normal Univ 52(03):316–321
Anh LT, Arkhipov MY, Burtsev MS (2017) Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition[J]. arXiv preprint arXiv:1709.09686
Cao M, Zou Y, Yang D et al (2019) GISCA: Gradient-inductive segmentation network with contextual attention for scene text detection[J]. IEEE Access 7:62805–62816
Wang H, Wei H, Guo J et al (2019) Ancient chinese sentence segmentation based on bidirectional LSTM+ CRF model[J]. J Adv Comput Intell Intell Inform 23(4):719–725
Csurka G, Perronnin F (2011) An efficient approach to semantic segmentation[J]. Int J Comput Vis 95(2):198–212
Jowi SM (2010) M ~ 3 N based integration of Chinese word segmentation and named entity recognition. J Tsinghua Univ (Natural Science Edition) 50(05):758–762 + 767
Collobert R, Weston J, Bottou L et al (2011) Naturallanguage processing(almost)from scratch[J]. J Mach Learn Res 12(1):2493–2537
Cui T (2016) Research and implementation of speech recognition system based on HMM[D]. Jilin University, Jilin, p 8–16
Yang HD, Sclaroff S, Lee SW (2009) Sign language spotting with a threshold model based on conditional random fields[J]. IEEE Trans Pattern Anal Mach Intell 31(7):1264–1277
Xu C, Xinrui N (2018) Research on the application of information extraction technology in the construction of mobile learning resources. Res Audio-Vis Educ 39(03):90–95 + 102
Novick LR, Stull AT, Catley KM Reading Phylogenetic Trees: The Effects of Tree Orientation and Text Processing on Comprehension[J]. Bioence 62(8):757–764
Lin X, Mengjie L (2019) Theoretical model and mechanism of learning analysis in intelligent learning environment [J]. Mod Educ Technol 29(04):19–25
Shahmirzadi O, Lugowski A, Younge K (2019) Text similarity in vector space models: a comparative study[C]//2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE 659–666
Belda NA, Plet C, Smeets RPP (2017) Analysis of faults in multi terminal HVDC grid for definition of test requirements of HVDC circuit breakers[J]. IEEE Trans Power Delivery:1–1
Liangying C, Junmin Z, Wang G, Kun Z (2019) Application of augmented reality in the intervention of children with autism: a case study of lexical cognitive intervention. Mod Educ Technol 29(08):86–92
Sun G, Li J, Dai J, Song Z, Lang F (2018) Feature selection for IoT based on maximal information coefficient[J]. Futur Gen Comput Syst Int J Esci 89:606–616
Ying CC (2008) In the long run, learning from war. Chinese word segmentation method based on conditional random fields. Intell Mag 05:79–81
Wang Q, Zhou Y, Ruan T et al (2019) Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition.[J]. J Biomed Inform 92:103133
Ping Z, Lianying S, Shuai T, BianJianling WY (2020) Research and application of improved knowledge transfer scene entity recognition algorithm [J]. Data Anal Knowl Discov 4(05):118–125
Xu Y, Wang Y, Liu T et al (2014) Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries[J]. J Am Med Inform Assoc 21(e1):e84–e92
Han H, Wang H, Wang X (2019) The conditional random field model combined with active learning is applied to the automatic identification of legal terms. Data Analys knowl Discover 3(06):66–74
Casillas A, Ezeiza N, Goenaga I, Pérez A, Soto X (2019) Measuring the effect of different types of unsupervised word representations on medical named entity recognition[J]. Int J Med Inform 129:100–106
Sun G, Chen T, Su Y, Li C (2018) Internet traffic classification based on incremental support vector machines[J]. Mob Netw Appl 23(4):789–796
Zhou G, Chen Y, Feng Y et al (2019) Processing of translation-ambiguous words by chinese–english bilinguals in sentence context[J]. J Psycholinguist Res 48(5):1133–1161
Lin T, Guo C, Jingfeng C, Leilei S (2020) Research on hierarchical relation extraction of domain ontology concepts based on Chinese academic literature [J]. J Inf Sci 39(04):387–398
Maggini M, Marra G, Melacci S et al (2019) Learning in text streams: Discovery and disambiguation of entity and relation instances[J]. IEEE Transactions on Neural Networks and Learning Systems
Pesaranghader A, Matwin S, Sokolova M et al (2019) deepBioWSD: effective deep neural word sense disambiguation of biomedical text data[J]. J Am Med Inform Assoc 26(5):438–446
Dawn DD, Shaikh SH, Pal RK (2019) A comprehensive review of Bengali word sense disambiguation[J]. Artif Intell Rev:1–31
Acknowledgements
This work was supported by the Natural Science Foundation of Hunan Province with No.2020JJ4434; Key Scientific Research Projects of Department of Education of Hunan Province with No.19A312; Hunan Provincial Science & Technology Project with No.2018TP1018 and No.2018RS3065; National Natural Science Foundation of China with No.61502254; Open Project Program of the State Key Lab of CAD&CG with No.A1926 of Zhejiang University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
ESM 1
(DOCX 28.6 kb)
Rights and permissions
About this article
Cite this article
Liu, S., He, T. & Dai, J. A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese. Mobile Netw Appl 26, 1891–1903 (2021). https://doi.org/10.1007/s11036-020-01725-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-020-01725-x