Named Entity Based Document Similarity with SVM-Based Re-ranking for Entity Linking

Alhelbawy, Ayman; Gaizauskas, Rob

doi:10.1007/978-3-642-35326-0_38

Ayman Alhelbawy^4,5 &
Rob Gaizauskas⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 322))

Included in the following conference series:

International Conference on Advanced Machine Learning Technologies and Applications

3321 Accesses
1 Citations

Abstract

In this paper we present a novel approach to search a knowledge base for an entry that contains information about a named entity (NE) mention as specified within a given context. A document similarity function (NEBSim) based on NE co-occurrence has been developed to calculate the similarity between two documents given a specific NE mention in one of them. NEBsim is also used in conjunction with the traditional cosine similarity measure to learn a model for ranking. Naive Bayes and SVM classifiers are used to re-rank the retrieved documents. Our experiments, carried out on TAC-KBP 2011 data, show NEBsim achieves significant improvement in accuracy as compared with a cosine similarity approach. They also show that re-ranking using learn to rank techniques can significantly improve the accuracy at high ranks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

McNamee, P., Dang, H.T.: Overview of the TAC 2009 knowledge base population track. In: Text Analysis Conference TAC (2009)
Google Scholar
Bunescu, R.C., Pasca, M.: Using Encyclopedic Knowledge for Named entity Disambiguation. In: Proceedings of EACL, vol. 6 (2006)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of EMNLP-CoNLL (2007)
Google Scholar
Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information (2011)
Google Scholar
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 1 (1998)
Google Scholar
Zheng, Z., Li, F., Huang, M., Zhu, X.: Learning to link entities with knowledge base. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2010)
Google Scholar
Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 731–740 (2007)
Google Scholar
Gottipati, S., Jiang, J.: Linking Entities to a Knowledge Base with Query Expansion. In: Empirical Methods in Natural Language Processing, EMNLP (2011)
Google Scholar
Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (2011)
Google Scholar
Mandl, T., Womser-Hacker, C.: The effect of named entities on effectiveness in cross-language information retrieval evaluation. In: Proceedings of the 2005 ACM Symposium on Applied Computing (2005)
Google Scholar
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)
Google Scholar
Reddy, B.K., Kumar, K., Krishna, S., Pingali, P., Varma, V.: Linking Named Entities to a Structured Knowledge Base. International Journal of Computational Linguistics and Applications 1(1-2), 121–136 (2010)
Google Scholar
Lin, D.: An Information-Theoretic Definition of Similarity. Morgan Kaufmann (1998)
Google Scholar
Liu, T.Y.: Learning to rank for information retrieval. Morgan Springer-Verlag New York Inc. (2011)
Google Scholar
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Google Scholar
Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Google Scholar
Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Sheffield, Sheffield, UK
Ayman Alhelbawy & Rob Gaizauskas
Information Science Department, Fayoum University, Fayoum, Egypt
Ayman Alhelbawy

Authors

Ayman Alhelbawy
View author publications
You can also search for this author in PubMed Google Scholar
Rob Gaizauskas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Cairo University, Egypt
Aboul Ella Hassanien & Rabie Ramadan &
Ain Shams University, Cairo, Egypt
Abdel-Badeeh M. Salem
University of Tasmania, TAS, Australia
Tai-hoon Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alhelbawy, A., Gaizauskas, R. (2012). Named Entity Based Document Similarity with SVM-Based Re-ranking for Entity Linking. In: Hassanien, A.E., Salem, AB.M., Ramadan, R., Kim, Th. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2012. Communications in Computer and Information Science, vol 322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35326-0_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-35326-0_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35325-3
Online ISBN: 978-3-642-35326-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics