Skip to main content
Log in

A term extraction algorithm based on machine learning and comprehensive feature strategy

  • S.I.: Machine Learning and Big Data Analytics for IoT Security and Privacy (SPIoT 2022)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Manual term extraction is similar to literal meaning: A translator browses text, classifies words, and prepares for translation. Terminology, as a centralized carrier of expertise, creation, popularization, and disappearance, dynamically reflects the development and evolution of an industry. The automatic extraction of terminology is a key technology for creating a professional terminology database, and it is also a key topic in the field of natural language processing. The purpose of this paper is to study how to analyse a term extraction algorithm based on machine learning and a comprehensive feature strategy. Focusing on the problems of poor generality and single statistical features of current term extraction algorithms, this paper proposes an improved domain ontology term extraction algorithm based on a comprehensive feature strategy. Moreover, automatic term extraction experiments based on a word-based maximum entropy model and a conditional random field model based on machine learning are conducted in this paper. Its word-based conditional random field model outperforms the maximum entropy model. The experimental results show that the algorithm based on the comprehensive feature strategy improves the accuracy by 8.6% compared with the TF-IDF algorithm and the C-value term extraction algorithm. This algorithm can be used to effectively extract the terms in a text and has good generality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

  1. Helma C, Cramer T, Kramer S et al (2018) Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Comput 35(4):1402–1411

    Google Scholar 

  2. Mullainathan S, Spiess J (2017) Machine learning: an applied econometric approach. J Econ Perspect 31(2):87–106

    Article  Google Scholar 

  3. Voyant C, Notton G, Kalogirou S et al (2017) Machine learning methods for solar radiation forecasting: a review. Renew Energy 105:569–582

    Article  Google Scholar 

  4. Zhou L, Pan S, Wang J et al (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361

    Article  Google Scholar 

  5. Kavakiotis I, Tsave O, Salifoglou A et al (2017) Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J 15:104–116

    Article  Google Scholar 

  6. Lamperti F, Roventini A, Sani A (2018) Agent-based model calibration using machine learning surrogates. J Econ Dyn Control 90:366–389

    Article  MathSciNet  Google Scholar 

  7. Zhang L, Tan J, Han D et al (2017) From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discovery Today 22(11):1680–1685

    Article  Google Scholar 

  8. Usman N et al (2021) A comprehensive survey on word representation models: from classical to state-of-the-art word representation language models. Trans Asian Low Res Lang Info Process 20(5):1–35

    MathSciNet  Google Scholar 

  9. Nazanin F, Nazarenko A, Alizon F (2020) Keyword extraction: Issues and methods. Nat Lang Eng 26(3):259–291

    Article  Google Scholar 

  10. Jiang Linfeng. (2019) Research on target detection method based on conditional random field model [D]. Shanghai Jiaotong University

  11. Poret N, Twilley RR, Coronado-Molina RM (2018) Object-based correction of LiDAR DEMs using RTK-GPS data and machine learning modeling in the coastal Everglades. Environ Model Softw 112(3):491–496

    Google Scholar 

  12. Liu S, Wang X, Liu M et al (2017) Towards better analysis of machine learning models: a visual analytics perspective. V Info 1(1):48–56

    Google Scholar 

  13. Zhang J, Zhuo W, Verma N (2017) In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE J Solid State Circuits 52(4):1–10

    Article  Google Scholar 

  14. Brynjolfsson E, Mitchell T (2017) What can machine learning do? Workforce implications. Science 358(6370):1530–1534

    Article  Google Scholar 

  15. Thrall JH, Li X, Li Q et al (2018) Artificial intelligence and machine learning in radiology: opportunities, challenges, pitfalls, and criteria for success. J Am Coll Radiol 15(3):504–508

    Article  Google Scholar 

  16. Gastegger M, Behler J, Marquetand P (2017) Machine learning molecular dynamics for the simulation of infrared spectra. Chem Sci 8(10):6924–6935

    Article  Google Scholar 

  17. Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 09(1):1–16

    Google Scholar 

  18. Benjamin SL, Alán AG (2018) Inverse molecular design using machine learning: generative models for matter engineering. Science 361(6400):360–365

    Article  Google Scholar 

  19. Goodfellow I, Mcdaniel P, Papernot N (2018) Making machine learning robust against adversarial inputs. Commun ACM 61(7):56–66

    Article  Google Scholar 

  20. Char DS, Shah NH, Magnus D (2018) Implementing machine learning in health care-addressing ethical challenges. N Engl J Med 378(11):981–983

    Article  Google Scholar 

  21. Zhang Y, Kim EA (2017) Quantum loop topography for machine learning[J]. Phys Rev Lett 118(21):2164011–2164015

    Article  MathSciNet  Google Scholar 

  22. Cai J, Luo J, Wang S et al (2018) Feature selection in machine learning: a new perspective. Neurocomputing 300:70–79

    Article  Google Scholar 

  23. Assouline D, Mohajeri N, Scartezzini JL (2017) Quantifying rooftop photovoltaic solar energy potential: a machine learning approach. Solar Energy 141:278–296

    Article  Google Scholar 

Download references

Funding

There is no funding for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaomei Hu.

Ethics declarations

Conflict of interest

There is no potential conflict of interest in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, X., Cheng, B., Hu, X. et al. A term extraction algorithm based on machine learning and comprehensive feature strategy. Neural Comput & Applic 36, 2385–2398 (2024). https://doi.org/10.1007/s00521-023-08960-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08960-9

Keywords

Navigation