Abstract
Disambiguation to Wikipedia (D2W) is the task of linking mentions of concepts in text to their corresponding Wikipedia articles. Traditional approaches to D2W has focused either in only one language (e.g. English) or in formal texts (e.g. news articles). In this paper, we present a multilingual framework with a set of new features that can be obtained purely from the online encyclopedia, without the need of any natural language specific tool. We analyze these features with different languages and different domains. The approach shows as fully language-independent and has been applied successfully to English, Italian, Polish, with a consistent improvement. We show that only a sufficient number of Wikipedia articles is needed for training. When trained on real-world data sets for English, our new features yield substantial improvement compared to current local and global disambiguation algorithms. Finally, the adaption to the Bridgeman query logs in digital libraries shows the robustness of our approach even in the lack of disambiguation context. Also, as no natural language specific tool is needed, the method can be applied to other languages in a similar manner with little adaptation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, pp. 233–242. ACM, New York (2007)
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–716. Association for Computational Linguistics, Prague (2007)
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1375–1384. Association for Computational Linguistics, Portland (2011)
Bunescu, R., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceesings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, pp. 9–16 (2006)
Cassidy, T., Ji, H., Ratinov, L.A., Zubiaga, A., Huang, H.: Analysis and enhancement of wikification for microblogs with context expansion. In: Proceedings of COLING 2012, Mumbai, India, pp. 441–456 (December 2012)
Nguyen, T.V.T., Poesio, M.: Entity disambiguation and linking over queries using encyclopedic knowledge. In: Proceedings of the 6th Workshop on Analytics for Noisy Unstructured Text Data. AND 2012 (December 2012)
Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with wikipedia. In: Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence (2008)
Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Transaction on Knowledge and Data Engineering 19(3), 370–383 (2007)
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of the 22nd Conference on Artificial Intelligence (2008)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011) Software available at, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, TV.T. (2013). Disambiguation to Wikipedia: A Language and Domain Independent Approach. In: Banchs, R.E., Silvestri, F., Liu, TY., Zhang, M., Gao, S., Lang, J. (eds) Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science, vol 8281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45068-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-45068-6_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45067-9
Online ISBN: 978-3-642-45068-6
eBook Packages: Computer ScienceComputer Science (R0)