Abstract
Detecting and resolving entities is an important step in information retrieval applications. Humans are able to recognize entities by context, but information extraction systems (IES) need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an IES that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles, is domain independent, and recognizes entities of arbitrary types.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bast, H., Chitea, A., Suchanek, F.M., Weber, I.: Ester: efficient search on text, entities, and relations. In: Kraaij, W., de Vries, A.P., Clarke, C.L.A., Fuhr, N., Kando, N. (eds.) Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 671–678. ACM (2007)
Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: McCarthy, D., Wintner, S. (eds.) Proc. 11th Conf. of the European Chapter of the Association for Computational Linguistics, Trento, Italy (2006)
Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Kambhampati, S., Knoblock, C.A. (eds.) Proceedings of IJCAI 2003 Workshop on Information Integration on the Web (IIWeb), Acapulco, Mexico, pp. 73–78 (2003)
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 708–716 (2007)
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K., Rajagopalan, S., Tomkins, A.: A case for automated large-scale semantic annotation. Web Semantics 1(1), 115–132 (2003)
Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China, pp. 277–285. Tsinghua University Press (2010)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, pp. 1535–1545 (2011)
Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Virgilio, R.D., Giunchiglia, F., Tanca, L. (eds.) Proc. 4th Intl. Workshop on Semantic Web Information Management (SWIM), Scottsdale, AZ. ACM (2012)
Halevy, A.Y., Etzioni, O., Doan, A., Ives, Z.G., Madhavan, J., McDowell, L., Tatarinov, I.: Crossing the structure chasm. In: Proc. 1st Biennal Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2003)
In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China. Tsinghua University Press (2010)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Yosef, M.A., Hoart, J., Bordino, I., Spaniol, M., Weikum, G.: Aida: An online tool for accurate disambiguation of named entities in text and tables. PVLDB 4(12), 1450–1453 (2011)
Zhang, W., Su, J., Tan, C.L., Wang, W.: Entity linking leveraging automatically generated annotation. In: Huang, C.R., Jurafsky, D. (eds.) Proc. 23rd International Conference on Computational Linguistics, Beijing, China, pp. 1290–1298. Tsinghua University Press (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hanafiah, N., Quix, C. (2014). Entity Recognition in Information Extraction. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8397. Springer, Cham. https://doi.org/10.1007/978-3-319-05476-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-05476-6_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05475-9
Online ISBN: 978-3-319-05476-6
eBook Packages: Computer ScienceComputer Science (R0)