Abstract
Facing the large amount of name mentions appearing on the web, entity linking turns to be a hot researching topic recently, in which an entity in a resource is assigned to one name mention to help users grasp the meaning of this name mention. Unfortunately, like word disambiguation, one name mention can refer to several entities without considering its context. Apparently, the name mentions that usually co-occur are related and can be considered together to determine their suitable entities. This approach is called collective entity linking and is often conducted based on entity graph. However, traditional collective entity linking methods either consume much time due to the large scale of entity graph or obtain low accuracy due to simplifying graph to boost speed. To improve both accuracy and efficiency, this paper proposes a novel collective entity linking algorithm. It constructs a complete entity graph by connecting any two related entities, and the relationship between two entities is measured via a random walk-based calculating way. After that the relationships between entities are modeled as a relationship matrix, and a hill-climbing-based algorithm is proposed to change entity linking task to a sub-matrix searching problem. Experimental results demonstrate that our linking algorithm can obtain both accurate linking results and low running time meanwhile.
Similar content being viewed by others
References
Meij E, Balog K, Odijk D (2013) Entity linking and retrieval. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 1127–1127
Shen W, Wang JY, Han JW (2015) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans Knowl Data Eng 27:443–460
He ZY, Liu SJ, Song Y, Li M, Zhou M, Wang HF (2013) Efficient collective entity linking with stacking. In: Proceedings of the 2013 conference on empirical methods in natural language processing. ACL, Stroudsburg, pp 426–435
Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150
Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web. ACM, New York, pp 729–738
Namata GM, Kok S, Getoor L (2011) Collective graph identification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 87–95
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems. ACM, New York, pp 121–124
Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering and maintaining links on the web of data. In: Proceedings of the 8th international semantic web conference, SWSA, Karlsruhe, Germany, pp 650–665
Ngomo A-CN, Auer S (2011) LIMES: a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Palo Alto, pp 2312–2317
Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang SE, Widom J (2009) Swoosh: a generic approach to entity resolution. Int J Very Large Data Bases 18:255–276
Blanco R, Ottaviano G, Meij E (2015) Fast and space-efficient entity linking in queries. In: Proceedings of the 8th ACM international conference on web search and data mining. ACM, New York, pp 179–188
Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) KORE: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, New York, pp 545–554
Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 conference on empirical methods in natural language processing and computational natural language learning. ACL, Prague, pp 708–716
Shen W, Han JW, Wang JY (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, New York, pp 1199–1210
He ZY, Liu SJ, Li M, Zhou M, Zhang LK, Wang HF (2013) Learning entity representation for entity disambiguation. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics. ACL, Stroudsburg, pp 30–34
Zheng ZC, Li FT, Huang ML, Zhu XY (2010) Learning to link entities with knowledge base. In: Proceedings of the 2010 annual conference of the North American chapter of the ACL. ACL, Stroudsburg, pp 483–491
Zuo Z, Kasneci G, Grütze T, Naumann F (2014) BEL: bagging for entity linking. In: Proceedings of the 25th international conference on computational linguistics. ICCL, Dublin, pp 2075–2086
Han XP, Sun L (2011) A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics. ACL, Portland, pp 945–954
Bunescu RC, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11st conference of the European chapter of the Association for Computational Linguistics. ACL, Trento, pp 9–16
Han XP, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 765–774
Usbeck R, Ngomo A-CN, Röder M, Gerber D, Coelho SA, Auer S, Both A (2014) AGDISTIS-Agnostic disambiguation of named entities using linked open data. In: Proceedings of the 21st European conference on artificial intelligence. IOS, Amsterdam, pp 1113–1114
Hachey B, Radford W, Curran JR (2011) Graph-based named entity linking with Wikipedia. In: Proceedings of the 12th international conference on web information system engineering. Springer, Berlin, pp 213–226
Ceccarelli D, Lucchese C, Orlando S, Perego R, Trani S (2013) Learning relatedness measures for entity linking. In: Proceedings of the 22nd ACM international conference on information and knowledge management. ACM, New York, pp 139–148
Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing. ACL, Edinburgh, pp 782–792
Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 457–466
Guo M, Liu Y, Li J, Li H, Xu B (2014) A knowledge based approach for tackling mislabeled multi-class big social data. In: Proceedings of the 11th semantic web: trends and challenges. Springer, Anissaras, pp 349–363
Tang M, Agrawal P, Nie F, Pongpaichet S, Jain R (2016) A graph based multimodal geospatial interpolation framework. In: Proceedings of the 2016 IEEE international conference on multimedia and expo. IEEE, Seattle, pp 1–6
Tang M, Nie F, Jain R (2016) Capped LP-norm graph embedding for photo clustering. In: Proceedings of the 2016 ACM on multimedia conference. ACM, Amsterdam, pp 431–435
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) LINE: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. ACM, Florence, pp 1067–1077
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 701–710
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, pp 1225–1234
Nguyen DB, Hoffart J, Theobald M, Weikum G (2014) AIDA-light: high-throughput named-entity disambiguation. In: Linked data on the web, WWW, Seoul
Ratinov L, Roth D, Downey D, Anderson M (2015) Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics. ACL, Portland, pp 1375–1384
Ristad ES, Yianilos PN (1998) Learning string-edit distance. IEEE Trans Pattern Anal Mach Intell 20:522–532
Cohen S (2013) Indexing for subtree similarity-search using edit distance. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 49–60
Guo YH, Che WX, Liu T, Li S (2011) A graph-based method for entity linking. In: Proceedings of the 5th international joint conference on natural language processing. ACL, Stroudsburg, pp 1010–1018
Wang W, Xiao C, Lin XM, Zhang CQ (2009) Efficient approximate entity extraction with edit distance constraints. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data. ACM, Providence, pp 759–770
Kim HM, Biehl M (2005) Exploiting the small-worlds of the semantic web to connect heterogeneous, local ontologies. Inf Technol Manag 6:89–96
Ahn J, Kim K (2012) Lower bound on expected complexity of depth-first tree search with multiple radii. IEEE Commun Lett 16:805–808
Langville AN, Meyer CD (2006) Updating markov chains with an eye on Google’s PageRank. SIAM J Matrix Anal Appl 27:968–987
Hoffart J, Altun Y, Weikum G (2014) Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd international conference on World Wide Web. ACM, New York, pp 385–396
Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing. ACL, Edinburgh, pp 782–792
Ji H, Nothman J, Hachey B (2014) Overview of TAC-KBP2014 entity discovery and linking tasks. In: Proceedings of the text analysis conference, NIST, Gaithersburg, pp 1–15
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1, Article 2
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. ACM, Chicago, pp 177–187
McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Proceedings of the 2012 advances in neural information processing systems, Lake Tahoe, Nevada, USA, pp 1–9
Acknowledgements
We thank Dr. Lei Chen’s enthusiastic and selfless help to re-implement many baseline algorithms. Furthermore, we thank the anonymous reviewers for providing us many helpful and insightful advices to help us revise our paper. This work is supported by National Natural Science Foundation of China (Nos. 61632011, 61772156, and 61702137), Microsoft Research Asia, CCF-Tencent Open Fund (No. CCF-TencentIAGR20160109), HIT-Tencent (No. AGR201601), and 2015 Guangdong Provincial Key Platform Project-the Youth Innovative Talent Funding.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Lemma 3.1
\( E\left( {\overline{{\pi_{ij} }} } \right) = \overrightarrow {{P\left( {i,j} \right)}} \), where\( \overline{{\pi_{ij} }} \)denotes the number of visits on node j by the random walks that all start from node i divided by the number of visits on all the nodes.\( \overrightarrow {{P\left( {i,j} \right)}} \)denotes the probability that a random walk starts from one node i and ends at another node j.
Proof
Let A denote the transition matrix. \( \overrightarrow {{P\left( {i,j} \right)}} \) can be calculated by
where l denotes the length of path between two nodes i and j. Since we limit the max length of path between two candidates as 5, l is at most 5. Aij denotes the entry located at the ith row and jth column in A. Since Aij is the transition probability from node i to node j, as indicated in [42] A d ij is the probability that a random walk starts from node i and visits node j via d steps.
Let VIij denote the estimation of the number of visits on node j by a random walk that starts from node i. The expectation of VIij is
Using VIij, the estimation in Eq. 13 can be rewritten as
where VIij(t) denotes the number of visits on node j by a random walk that starts from node i at tth time.
Due to the fact that the random walks starting from different nodes are independent on each other and the random walks starting from the same node at different times are also independent on each other, we have
After importing Eq. 19 in Eq. 23, it is easy to obtain that
□
Rights and permissions
About this article
Cite this article
Liu, M., Zhao, Y., Qin, B. et al. Collective entity linking: a random walk-based perspective. Knowl Inf Syst 60, 1611–1643 (2019). https://doi.org/10.1007/s10115-018-1273-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1273-z