Abstract
In this paper we present a novel approach to search a knowledge base for an entry that contains information about a named entity (NE) mention as specified within a given context. A document similarity function (NEBSim) based on NE co-occurrence has been developed to calculate the similarity between two documents given a specific NE mention in one of them. NEBsim is also used in conjunction with the traditional cosine similarity measure to learn a model for ranking. Naive Bayes and SVM classifiers are used to re-rank the retrieved documents. Our experiments, carried out on TAC-KBP 2011 data, show NEBsim achieves significant improvement in accuracy as compared with a cosine similarity approach. They also show that re-ranking using learn to rank techniques can significantly improve the accuracy at high ranks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
McNamee, P., Dang, H.T.: Overview of the TAC 2009 knowledge base population track. In: Text Analysis Conference TAC (2009)
Bunescu, R.C., Pasca, M.: Using Encyclopedic Knowledge for Named entity Disambiguation. In: Proceedings of EACL, vol. 6 (2006)
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of EMNLP-CoNLL (2007)
Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information (2011)
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 1 (1998)
Zheng, Z., Li, F., Huang, M., Zhu, X.: Learning to link entities with knowledge base. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2010)
Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 731–740 (2007)
Gottipati, S., Jiang, J.: Linking Entities to a Knowledge Base with Query Expansion. In: Empirical Methods in Natural Language Processing, EMNLP (2011)
Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (2011)
Mandl, T., Womser-Hacker, C.: The effect of named entities on effectiveness in cross-language information retrieval evaluation. In: Proceedings of the 2005 ACM Symposium on Applied Computing (2005)
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2004)
Reddy, B.K., Kumar, K., Krishna, S., Pingali, P., Varma, V.: Linking Named Entities to a Structured Knowledge Base. International Journal of Computational Linguistics and Applications 1(1-2), 121–136 (2010)
Lin, D.: An Information-Theoretic Definition of Similarity. Morgan Kaufmann (1998)
Liu, T.Y.: Learning to rank for information retrieval. Morgan Springer-Verlag New York Inc. (2011)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Joachims, T.: Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alhelbawy, A., Gaizauskas, R. (2012). Named Entity Based Document Similarity with SVM-Based Re-ranking for Entity Linking. In: Hassanien, A.E., Salem, AB.M., Ramadan, R., Kim, Th. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2012. Communications in Computer and Information Science, vol 322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35326-0_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-35326-0_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35325-3
Online ISBN: 978-3-642-35326-0
eBook Packages: Computer ScienceComputer Science (R0)