Named Entity Extraction from Semi-structured Data Using Machine Learning Algorithms

Mansurova, Madina; Barakhnin, Vladimir; Khibatkhanuly, Yerzhan; Pastushkov, Ilya

doi:10.1007/978-3-030-28374-2_6

Madina Mansurova ORCID: orcid.org/0000-0002-9680-2758¹³,
Vladimir Barakhnin^14,15,
Yerzhan Khibatkhanuly¹³ &
…
Ilya Pastushkov^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11684))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1866 Accesses
6 Citations

Abstract

The modern society have been witnessed that intensive development of Internet technologies had followed to information explosion during last decades. This explosion had been expressing by an exponential growth of data volume among the low-quality information. This paper is designed to provide detailed information about some intellectual tools which are support decision taking by automatic knowledge extraction. In the first part of paper, we considered a preprocessing contains morphological analysis of texts. Then we had considered the model of text documents in the form of a hypergraph and implementation of the random walk method to extract semantically close word’s pairs, in other words, pairs that often appears together. Result of calculations is matrix with word affinity coefficients corresponding to each other component of vocabulary vector. In the second part we describe training of neural network for linguistic constructions extraction. These ones include possible values of text named entities descriptors. The neural network enables to retrieve information on one preselected descriptor, for example, location, in the form of the final result of the name of geographical objects. In a general case, the neural network can retrieve information on several descriptors simultaneously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shokin, Yu.I., Fedotov, A.M., Barakhnin, V.B.: Problems of information retrieval. Novosibirsk: Sci. 196 p. (2010). (In Russian)
Google Scholar
Barakhnin, V.B., Fedotov, A.M.: Building a factual search model. Vestnik NSU. Series: Information Technology, vol. 11, no. 4. pp. 16–27 (2013)
Google Scholar
Pedersen, T.: A simple approach to building ensembles of naive bayesian classifiers for word sense disambiguation. ACM (2000)
Google Scholar
Borkar, V., Sarawahi, S.: Automatic segmentation of text into structured records. ACM (2001)
Google Scholar
Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, USA (2004)
Google Scholar
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)
Google Scholar
Zelenko, D., Aone, C.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003)
MathSciNet MATH Google Scholar
Califf, M., Moony, R.J.: Bottom-up relational learning of matching rules for information extraction. J. Mach. Learn. Res. 4, 177–210 (2003)
MathSciNet Google Scholar
Dejean, H.: Learning rules and their exceptions. J. Mach. Learn. Res. 2, 669–693 (2002)
MATH Google Scholar
Aung, A., Thwal, M.P.: Onthology based hotel information extraction from unstructured text. In: International Conference on Advances in Engineering and Technology (ICAET 2014), 29–30 March, Singapore (2014)
Google Scholar
Anantharangachar, R., Ramani, S., Rajagopalan, S.: Ontology guided information extraction from unstructured text. Int. J. Web Semant. Technol. (IJWesT) 4(1), 19–36 (2013)
Article Google Scholar
Stanford CoreNLP: A Suite of Core NLP Tools. (2015). http://nlp.stanford.edu/software/corenlp.shtml
Atzmueller, M., Kluegl, P.: Rule-based information extraction for structured data acquisition using TextMarker. In: LWA (2008)
Google Scholar
Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S.: SystemT: an algebraic approach to declarative information extraction. In: ACL (2010)
Google Scholar
Kluegl, P., Atzmueller, M., Puppe, F.: TextMarker: a tool for rule-based information extraction. In: UIMA@GSCL Workshop, pp. 233–240 (2009)
Google Scholar
Chopra, D., Joshi, N., Mathur, I.: Named entity recognition in Hindi using hidden Markov model. In: 2016 Second International Conference on Computational Intelligence and Communication Technology (CICT), pp. 581–586. IEEE (2016)
Google Scholar
Malik, M.K., Sarwar, S.M.: Urdu named entity recognition system using hidden Markov model. Pak. J. Eng. & Appl. Sci. 21,15–22 (2017)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Icml 2000, vol. 17, pp. 591–598 (2000)
Google Scholar
Ahmed, I., Sathyaraj, R.: Named entity recognition by using maximum entropy. Int. J. Database Theory Appl. 8(2), 43–50 (2015)
Article Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Google Scholar
Lee, C., et al.: Fine-grained named entity recognition using conditional random fields for question answering. In: Ng, H.T., Leong, M.-K., Kan, M.-Y., Ji, D. (eds.) AIRS 2006. LNCS, vol. 4182, pp. 581–587. Springer, Heidelberg (2006). https://doi.org/10.1007/11880592_49
Chapter Google Scholar
Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121 (2006)
Google Scholar
Bellaachia, A., Al Dhelaan, M.: HGRANK: a hypergraph based keyphrase extraction for short documents in dynamic genre (2014)
Google Scholar
Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Pearson, London (2009). 936 p
Google Scholar
Korobov, M.: Morphological analyzer and generator for russian and ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31
Chapter Google Scholar

Download references

Acknowledgments

This work was supported in part under grant of Foundation of Ministry of Education and Science of the Republic of Kazakhstan “Development of a system for knowledge extraction from heterogeneous data sources to improve the quality of decision-making” (2018–2020) and Russian Federation RFBR grant № 18-07-01457.

Author information

Authors and Affiliations

Al-Farabi Kazakh National University, Almaty, 050040, Kazakhstan
Madina Mansurova & Yerzhan Khibatkhanuly
Institute of Computational Technologies SB RAS, Novosibirsk, Russian Federation
Vladimir Barakhnin & Ilya Pastushkov
Novosibirsk State University, Novosibirsk, Russian Federation
Vladimir Barakhnin & Ilya Pastushkov

Authors

Madina Mansurova
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Barakhnin
View author publications
You can also search for this author in PubMed Google Scholar
Yerzhan Khibatkhanuly
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Pastushkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madina Mansurova .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
University of Pau and Pays de l'Adour, Pau, France
Richard Chbeir
University of Pau and Pays de l'Adour, Pau, France
Ernesto Exposito
University of Pau and Pays de l'Adour, Pau, France
Philippe Aniorté
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mansurova, M., Barakhnin, V., Khibatkhanuly, Y., Pastushkov, I. (2019). Named Entity Extraction from Semi-structured Data Using Machine Learning Algorithms. In: Nguyen, N., Chbeir, R., Exposito, E., Aniorté, P., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2019. Lecture Notes in Computer Science(), vol 11684. Springer, Cham. https://doi.org/10.1007/978-3-030-28374-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-28374-2_6
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28373-5
Online ISBN: 978-3-030-28374-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics