Skip to main content

Named Entity Extraction from Semi-structured Data Using Machine Learning Algorithms

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11684))

Included in the following conference series:

Abstract

The modern society have been witnessed that intensive development of Internet technologies had followed to information explosion during last decades. This explosion had been expressing by an exponential growth of data volume among the low-quality information. This paper is designed to provide detailed information about some intellectual tools which are support decision taking by automatic knowledge extraction. In the first part of paper, we considered a preprocessing contains morphological analysis of texts. Then we had considered the model of text documents in the form of a hypergraph and implementation of the random walk method to extract semantically close word’s pairs, in other words, pairs that often appears together. Result of calculations is matrix with word affinity coefficients corresponding to each other component of vocabulary vector. In the second part we describe training of neural network for linguistic constructions extraction. These ones include possible values of text named entities descriptors. The neural network enables to retrieve information on one preselected descriptor, for example, location, in the form of the final result of the name of geographical objects. In a general case, the neural network can retrieve information on several descriptors simultaneously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shokin, Yu.I., Fedotov, A.M., Barakhnin, V.B.: Problems of information retrieval. Novosibirsk: Sci. 196 p. (2010). (In Russian)

    Google Scholar 

  2. Barakhnin, V.B., Fedotov, A.M.: Building a factual search model. Vestnik NSU. Series: Information Technology, vol. 11, no. 4. pp. 16–27 (2013)

    Google Scholar 

  3. Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. ACM (2000)

    Google Scholar 

  4. Borkar, V., Sarawahi, S.: Automatic segmentation of text into structured records. ACM (2001)

    Google Scholar 

  5. Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, USA (2004)

    Google Scholar 

  6. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)

    Google Scholar 

  7. Zelenko, D., Aone, C.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003)

    MathSciNet  MATH  Google Scholar 

  8. Califf, M., Moony, R.J.: Bottom-up relational learning of matching rules for information extraction. J. Mach. Learn. Res. 4, 177–210 (2003)

    MathSciNet  Google Scholar 

  9. Dejean, H.: Learning rules and their exceptions. J. Mach. Learn. Res. 2, 669–693 (2002)

    MATH  Google Scholar 

  10. Aung, A., Thwal, M.P.: Onthology based hotel information extraction from unstructured text. In: International Conference on Advances in Engineering and Technology (ICAET 2014), 29–30 March, Singapore (2014)

    Google Scholar 

  11. Anantharangachar, R., Ramani, S., Rajagopalan, S.: Ontology guided information extraction from unstructured text. Int. J. Web Semant. Technol. (IJWesT) 4(1), 19–36 (2013)

    Article  Google Scholar 

  12. Stanford CoreNLP: A Suite of Core NLP Tools. (2015). http://nlp.stanford.edu/software/corenlp.shtml

  13. Atzmueller, M., Kluegl, P.: Rule-based information extraction for structured data acquisition using TextMarker. In: LWA (2008)

    Google Scholar 

  14. Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S.: SystemT: an algebraic approach to declarative information extraction. In: ACL (2010)

    Google Scholar 

  15. Kluegl, P., Atzmueller, M., Puppe, F.: TextMarker: a tool for rule-based information extraction. In: UIMA@GSCL Workshop, pp. 233–240 (2009)

    Google Scholar 

  16. Chopra, D., Joshi, N., Mathur, I.: Named entity recognition in Hindi using hidden Markov model. In: 2016 Second International Conference on Computational Intelligence and Communication Technology (CICT), pp. 581–586. IEEE (2016)

    Google Scholar 

  17. Malik, M.K., Sarwar, S.M.: Urdu named entity recognition system using hidden Markov model. Pak. J. Eng. & Appl. Sci. 21,15–22 (2017)

    Google Scholar 

  18. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Icml 2000, vol. 17, pp. 591–598 (2000)

    Google Scholar 

  19. Ahmed, I., Sathyaraj, R.: Named entity recognition by using maximum entropy. Int. J. Database Theory Appl. 8(2), 43–50 (2015)

    Article  Google Scholar 

  20. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)

    Google Scholar 

  21. Lee, C., et al.: Fine-grained named entity recognition using conditional random fields for question answering. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 581–587. Springer, Heidelberg (2006). https://doi.org/10.1007/11880592_49

    Chapter  Google Scholar 

  22. Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121 (2006)

    Google Scholar 

  23. Bellaachia, A., Al Dhelaan, M.: HGRANK: a hypergraph based keyphrase extraction for short documents in dynamic genre (2014)

    Google Scholar 

  24. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Pearson, London (2009). 936 p

    Google Scholar 

  25. Korobov, M.: Morphological analyzer and generator for russian and ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported in part under grant of Foundation of Ministry of Education and Science of the Republic of Kazakhstan “Development of a system for knowledge extraction from heterogeneous data sources to improve the quality of decision-making” (2018–2020) and Russian Federation RFBR grant № 18-07-01457.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madina Mansurova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mansurova, M., Barakhnin, V., Khibatkhanuly, Y., Pastushkov, I. (2019). Named Entity Extraction from Semi-structured Data Using Machine Learning Algorithms. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11684. Springer, Cham. https://doi.org/10.1007/978-3-030-28374-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28374-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28373-5

  • Online ISBN: 978-3-030-28374-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics