Skip to main content
Log in

A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Chinese word segmentation is an important research direction in related research on elementary mathematics knowledge extraction. The speed of segmentation directly affects subsequent applications, and the accuracy of segmentation directly affects corresponding research in the next step. In the machine learning methods for extracting basic mathematical knowledge points, the Conditional Random Field (CRF) model implements new word discovery well, and is increasingly used in knowledge extraction of basic mathematics. This article first introduces the traditional CRF process of named entity recognition. Then, an improved algorithm CRF++for conditional field model is proposed. Since the recognition rate of named entities based on traditional machine learning methods is not high, a post-processing method for entity recognition that automatically generates a dictionary is proposed. After identifying mathematical entities, a pruning strategy combining Viterbi algorithm and rules is proposed to achieve a higher recognition rate of elementary mathematical entities. Finally, several methods of disambiguation after entity recognition are introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Peng P (2019) Natural language processing — Chinese vectorization short text book quantitative study [D]. Central China Normal University, Hubei, p 14–18

  2. Asperti A, Padovani L, Coen CS, Guidi F, Schena I (2003) Mathematical knowledge management in HELM. Ann Math Artif Intel 38(1–3):27–46

  3. Yang D, Yang D, Hang G, Daocheng H, Gao M, Wang Y (2019) Research on knowledge point relation extraction for elementary mathematics [J]. J East China Normal Univ 05:53–65

    Google Scholar 

  4. Zhu H, Yang L, Wenxue D, Jiamei F (2018) Chinese micro blog named entity recognition based on subject tag and CRF. J Cen China Normal Univ 52(03):316–321

    Google Scholar 

  5. Anh LT, Arkhipov MY, Burtsev MS (2017) Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition[J]. arXiv preprint arXiv:1709.09686

  6. Cao M, Zou Y, Yang D et al (2019) GISCA: Gradient-inductive segmentation network with contextual attention for scene text detection[J]. IEEE Access 7:62805–62816

  7. Wang H, Wei H, Guo J et al (2019) Ancient chinese sentence segmentation based on bidirectional LSTM+ CRF model[J]. J Adv Comput Intell Intell Inform 23(4):719–725

  8. Csurka G, Perronnin F (2011) An efficient approach to semantic segmentation[J]. Int J Comput Vis 95(2):198–212

    Article  MathSciNet  Google Scholar 

  9. Jowi SM (2010) M ~ 3 N based integration of Chinese word segmentation and named entity recognition. J Tsinghua Univ (Natural Science Edition) 50(05):758–762 + 767

    Google Scholar 

  10. Collobert R, Weston J, Bottou L et al (2011) Naturallanguage processing(almost)from scratch[J]. J Mach Learn Res 12(1):2493–2537

    MATH  Google Scholar 

  11. Cui T (2016) Research and implementation of speech recognition system based on HMM[D]. Jilin University, Jilin, p 8–16

  12. Yang HD, Sclaroff S, Lee SW (2009) Sign language spotting with a threshold model based on conditional random fields[J]. IEEE Trans Pattern Anal Mach Intell 31(7):1264–1277

    Article  Google Scholar 

  13. Xu C, Xinrui N (2018) Research on the application of information extraction technology in the construction of mobile learning resources. Res Audio-Vis Educ 39(03):90–95 + 102

    Google Scholar 

  14. Novick LR, Stull AT, Catley KM Reading Phylogenetic Trees: The Effects of Tree Orientation and Text Processing on Comprehension[J]. Bioence 62(8):757–764

  15. Lin X, Mengjie L (2019) Theoretical model and mechanism of learning analysis in intelligent learning environment [J]. Mod Educ Technol 29(04):19–25

    Google Scholar 

  16. Shahmirzadi O, Lugowski A, Younge K (2019) Text similarity in vector space models: a comparative study[C]//2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE 659–666

  17. Belda NA, Plet C, Smeets RPP (2017) Analysis of faults in multi terminal HVDC grid for definition of test requirements of HVDC circuit breakers[J]. IEEE Trans Power Delivery:1–1

  18. Liangying C, Junmin Z, Wang G, Kun Z (2019) Application of augmented reality in the intervention of children with autism: a case study of lexical cognitive intervention. Mod Educ Technol 29(08):86–92

    Google Scholar 

  19. Sun G, Li J, Dai J, Song Z, Lang F (2018) Feature selection for IoT based on maximal information coefficient[J]. Futur Gen Comput Syst Int J Esci 89:606–616

    Article  Google Scholar 

  20. Ying CC (2008) In the long run, learning from war. Chinese word segmentation method based on conditional random fields. Intell Mag 05:79–81

    Google Scholar 

  21. Wang Q, Zhou Y, Ruan T et al (2019) Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition.[J]. J Biomed Inform 92:103133

  22. Ping Z, Lianying S, Shuai T, BianJianling WY (2020) Research and application of improved knowledge transfer scene entity recognition algorithm [J]. Data Anal Knowl Discov 4(05):118–125

    Google Scholar 

  23. Xu Y, Wang Y, Liu T et al (2014) Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries[J]. J Am Med Inform Assoc 21(e1):e84–e92

  24. Han H, Wang H, Wang X (2019) The conditional random field model combined with active learning is applied to the automatic identification of legal terms. Data Analys knowl Discover 3(06):66–74

    Google Scholar 

  25. Casillas A, Ezeiza N, Goenaga I, Pérez A, Soto X (2019) Measuring the effect of different types of unsupervised word representations on medical named entity recognition[J]. Int J Med Inform 129:100–106

    Article  Google Scholar 

  26. Sun G, Chen T, Su Y, Li C (2018) Internet traffic classification based on incremental support vector machines[J]. Mob Netw Appl 23(4):789–796

    Article  Google Scholar 

  27. Zhou G, Chen Y, Feng Y et al (2019) Processing of translation-ambiguous words by chinese–english bilinguals in sentence context[J]. J Psycholinguist Res 48(5):1133–1161

  28. Lin T, Guo C, Jingfeng C, Leilei S (2020) Research on hierarchical relation extraction of domain ontology concepts based on Chinese academic literature [J]. J Inf Sci 39(04):387–398

    Google Scholar 

  29. Maggini M, Marra G, Melacci S et al (2019) Learning in text streams: Discovery and disambiguation of entity and relation instances[J]. IEEE Transactions on Neural Networks and Learning Systems

  30. Pesaranghader A, Matwin S, Sokolova M et al (2019) deepBioWSD: effective deep neural word sense disambiguation of biomedical text data[J]. J Am Med Inform Assoc 26(5):438–446

  31. Dawn DD, Shaikh SH, Pal RK (2019) A comprehensive review of Bengali word sense disambiguation[J]. Artif Intell Rev:1–31

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of Hunan Province with No.2020JJ4434; Key Scientific Research Projects of Department of Education of Hunan Province with No.19A312; Hunan Provincial Science & Technology Project with No.2018TP1018 and No.2018RS3065; National Natural Science Foundation of China with No.61502254; Open Project Program of the State Key Lab of CAD&CG with No.A1926 of Zhejiang University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuai Liu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(DOCX 28.6 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S., He, T. & Dai, J. A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese. Mobile Netw Appl 26, 1891–1903 (2021). https://doi.org/10.1007/s11036-020-01725-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-020-01725-x

Keywords

Navigation