Collective entity linking: a random walk-based perspective

Liu, Ming; Zhao, Yanyan; Qin, Bing; Liu, Ting

doi:10.1007/s10115-018-1273-z

Collective entity linking: a random walk-based perspective

Regular Paper
Published: 26 September 2018

Volume 60, pages 1611–1643, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ming Liu ORCID: orcid.org/0000-0001-7915-1001¹,
Yanyan Zhao²,
Bing Qin¹ &
…
Ting Liu¹

453 Accesses
5 Citations
Explore all metrics

Abstract

Facing the large amount of name mentions appearing on the web, entity linking turns to be a hot researching topic recently, in which an entity in a resource is assigned to one name mention to help users grasp the meaning of this name mention. Unfortunately, like word disambiguation, one name mention can refer to several entities without considering its context. Apparently, the name mentions that usually co-occur are related and can be considered together to determine their suitable entities. This approach is called collective entity linking and is often conducted based on entity graph. However, traditional collective entity linking methods either consume much time due to the large scale of entity graph or obtain low accuracy due to simplifying graph to boost speed. To improve both accuracy and efficiency, this paper proposes a novel collective entity linking algorithm. It constructs a complete entity graph by connecting any two related entities, and the relationship between two entities is measured via a random walk-based calculating way. After that the relationships between entities are modeled as a relationship matrix, and a hill-climbing-based algorithm is proposed to change entity linking task to a sub-matrix searching problem. Experimental results demonstrate that our linking algorithm can obtain both accurate linking results and low running time meanwhile.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Path-Based Entity Relatedness Measure for Efficient Collective Entity Linking

Graph-ranking collective Chinese entity linking algorithm

Article 30 August 2019

Collective Entity Linking Based on DBpedia

References

Meij E, Balog K, Odijk D (2013) Entity linking and retrieval. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 1127–1127
Shen W, Wang JY, Han JW (2015) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans Knowl Data Eng 27:443–460
Article Google Scholar
He ZY, Liu SJ, Song Y, Li M, Zhou M, Wang HF (2013) Efficient collective entity linking with stacking. In: Proceedings of the 2013 conference on empirical methods in natural language processing. ACL, Stroudsburg, pp 426–435
Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150
Article MathSciNet MATH Google Scholar
Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web. ACM, New York, pp 729–738
Namata GM, Kok S, Getoor L (2011) Collective graph identification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 87–95
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems. ACM, New York, pp 121–124
Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering and maintaining links on the web of data. In: Proceedings of the 8th international semantic web conference, SWSA, Karlsruhe, Germany, pp 650–665
Ngomo A-CN, Auer S (2011) LIMES: a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Palo Alto, pp 2312–2317
Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang SE, Widom J (2009) Swoosh: a generic approach to entity resolution. Int J Very Large Data Bases 18:255–276
Article Google Scholar
Blanco R, Ottaviano G, Meij E (2015) Fast and space-efficient entity linking in queries. In: Proceedings of the 8th ACM international conference on web search and data mining. ACM, New York, pp 179–188
Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) KORE: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, New York, pp 545–554
Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 conference on empirical methods in natural language processing and computational natural language learning. ACL, Prague, pp 708–716
Shen W, Han JW, Wang JY (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, New York, pp 1199–1210
He ZY, Liu SJ, Li M, Zhou M, Zhang LK, Wang HF (2013) Learning entity representation for entity disambiguation. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics. ACL, Stroudsburg, pp 30–34
Zheng ZC, Li FT, Huang ML, Zhu XY (2010) Learning to link entities with knowledge base. In: Proceedings of the 2010 annual conference of the North American chapter of the ACL. ACL, Stroudsburg, pp 483–491
Zuo Z, Kasneci G, Grütze T, Naumann F (2014) BEL: bagging for entity linking. In: Proceedings of the 25th international conference on computational linguistics. ICCL, Dublin, pp 2075–2086
Han XP, Sun L (2011) A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics. ACL, Portland, pp 945–954
Bunescu RC, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11st conference of the European chapter of the Association for Computational Linguistics. ACL, Trento, pp 9–16
Han XP, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 765–774
Usbeck R, Ngomo A-CN, Röder M, Gerber D, Coelho SA, Auer S, Both A (2014) AGDISTIS-Agnostic disambiguation of named entities using linked open data. In: Proceedings of the 21st European conference on artificial intelligence. IOS, Amsterdam, pp 1113–1114
Hachey B, Radford W, Curran JR (2011) Graph-based named entity linking with Wikipedia. In: Proceedings of the 12th international conference on web information system engineering. Springer, Berlin, pp 213–226
Ceccarelli D, Lucchese C, Orlando S, Perego R, Trani S (2013) Learning relatedness measures for entity linking. In: Proceedings of the 22nd ACM international conference on information and knowledge management. ACM, New York, pp 139–148
Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing. ACL, Edinburgh, pp 782–792
Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 457–466
Guo M, Liu Y, Li J, Li H, Xu B (2014) A knowledge based approach for tackling mislabeled multi-class big social data. In: Proceedings of the 11th semantic web: trends and challenges. Springer, Anissaras, pp 349–363
Tang M, Agrawal P, Nie F, Pongpaichet S, Jain R (2016) A graph based multimodal geospatial interpolation framework. In: Proceedings of the 2016 IEEE international conference on multimedia and expo. IEEE, Seattle, pp 1–6
Tang M, Nie F, Jain R (2016) Capped LP-norm graph embedding for photo clustering. In: Proceedings of the 2016 ACM on multimedia conference. ACM, Amsterdam, pp 431–435
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Article Google Scholar
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Article Google Scholar
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) LINE: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. ACM, Florence, pp 1067–1077
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 701–710
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, pp 1225–1234
Nguyen DB, Hoffart J, Theobald M, Weikum G (2014) AIDA-light: high-throughput named-entity disambiguation. In: Linked data on the web, WWW, Seoul
Ratinov L, Roth D, Downey D, Anderson M (2015) Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics. ACL, Portland, pp 1375–1384
Ristad ES, Yianilos PN (1998) Learning string-edit distance. IEEE Trans Pattern Anal Mach Intell 20:522–532
Article Google Scholar
Cohen S (2013) Indexing for subtree similarity-search using edit distance. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 49–60
Guo YH, Che WX, Liu T, Li S (2011) A graph-based method for entity linking. In: Proceedings of the 5th international joint conference on natural language processing. ACL, Stroudsburg, pp 1010–1018
Wang W, Xiao C, Lin XM, Zhang CQ (2009) Efficient approximate entity extraction with edit distance constraints. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data. ACM, Providence, pp 759–770
Kim HM, Biehl M (2005) Exploiting the small-worlds of the semantic web to connect heterogeneous, local ontologies. Inf Technol Manag 6:89–96
Article Google Scholar
Ahn J, Kim K (2012) Lower bound on expected complexity of depth-first tree search with multiple radii. IEEE Commun Lett 16:805–808
Article Google Scholar
Langville AN, Meyer CD (2006) Updating markov chains with an eye on Google’s PageRank. SIAM J Matrix Anal Appl 27:968–987
Article MathSciNet MATH Google Scholar
Hoffart J, Altun Y, Weikum G (2014) Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd international conference on World Wide Web. ACM, New York, pp 385–396
Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing. ACL, Edinburgh, pp 782–792
Ji H, Nothman J, Hachey B (2014) Overview of TAC-KBP2014 entity discovery and linking tasks. In: Proceedings of the text analysis conference, NIST, Gaithersburg, pp 1–15
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1, Article 2
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. ACM, Chicago, pp 177–187
McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Proceedings of the 2012 advances in neural information processing systems, Lake Tahoe, Nevada, USA, pp 1–9

Download references

Acknowledgements

We thank Dr. Lei Chen’s enthusiastic and selfless help to re-implement many baseline algorithms. Furthermore, we thank the anonymous reviewers for providing us many helpful and insightful advices to help us revise our paper. This work is supported by National Natural Science Foundation of China (Nos. 61632011, 61772156, and 61702137), Microsoft Research Asia, CCF-Tencent Open Fund (No. CCF-TencentIAGR20160109), HIT-Tencent (No. AGR201601), and 2015 Guangdong Provincial Key Platform Project-the Youth Innovative Talent Funding.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
Ming Liu, Bing Qin & Ting Liu
School of Mechatronics Engineering, Harbin Institute of Technology, Harbin, 150001, China
Yanyan Zhao

Authors

Ming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Bing Qin
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanyan Zhao.

Appendix

Lemma 3.1

$ E\left( {\overline{{\pi_{ij} }} } \right) = \overrightarrow {{P\left( {i,j} \right)}} $, where$ \overline{{\pi_{ij} }} $denotes the number of visits on node j by the random walks that all start from node i divided by the number of visits on all the nodes.$ \overrightarrow {{P\left( {i,j} \right)}} $denotes the probability that a random walk starts from one node i and ends at another node j.

Proof

Let A denote the transition matrix. $ \overrightarrow {{P\left( {i,j} \right)}} $ can be calculated by

$$ \overrightarrow {{P\left( {i,j} \right)}} = \frac{{\sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } }}{l} = \frac{{\left[ {A_{{}}^{1} } \right]_{ij} + \left[ {A_{{}}^{2} } \right]_{ij} + \left[ {A_{{}}^{3} } \right]_{ij} + \cdots + \left[ {A_{{}}^{l} } \right]_{ij} }}{l} $$

(19)

where l denotes the length of path between two nodes i and j. Since we limit the max length of path between two candidates as 5, l is at most 5. A_ij denotes the entry located at the ith row and jth column in A. Since A_ij is the transition probability from node i to node j, as indicated in [42] A ^d_ij is the probability that a random walk starts from node i and visits node j via d steps.

Let VI_ij denote the estimation of the number of visits on node j by a random walk that starts from node i. The expectation of VI_ij is

$$ E\left( {VI_{ij} } \right) = \sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } = \left[ {A_{{}}^{1} } \right]_{ij} + \left[ {A_{{}}^{2} } \right]_{ij} + \left[ {A_{{}}^{3} } \right]_{ij} + \cdots + \left[ {A_{{}}^{l} } \right]_{ij} $$

(20)

Using VI_ij, the estimation in Eq. 13 can be rewritten as

$$ \overline{{\pi_{ij} }} = \frac{{\sum\nolimits_{t = 1}^{m} {VI_{ij} \left( t \right)} }}{ml} $$

(21)

where VI_ij(t) denotes the number of visits on node j by a random walk that starts from node i at tth time.

Due to the fact that the random walks starting from different nodes are independent on each other and the random walks starting from the same node at different times are also independent on each other, we have

$$ E\left( {\overline{{\pi_{ij} }} } \right) = E\left( {\frac{{\sum\nolimits_{t = 1}^{m} {VI_{ij} \left( t \right)} }}{ml}} \right) = \frac{1}{ml}\sum\limits_{t = 1}^{m} {\left( {E\left( {VI_{ij} } \right)} \right)} $$

(22)

From Eqs. 20 and 22, we have

$$ E\left( {\overline{{\pi_{ij} }} } \right) = \frac{1}{ml}\sum\limits_{t = 1}^{m} {\left( {E\left( {VI_{ij} } \right)} \right)} = \frac{1}{ml}\sum\limits_{t = 1}^{m} {\left( {\sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } } \right)} = \frac{1}{l}\sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } $$

(23)

After importing Eq. 19 in Eq. 23, it is easy to obtain that

$$ E\left( {\overline{{\pi_{ij} }} } \right) = \overrightarrow {{P\left( {i,j} \right)}} $$

(24)

□

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, M., Zhao, Y., Qin, B. et al. Collective entity linking: a random walk-based perspective. Knowl Inf Syst 60, 1611–1643 (2019). https://doi.org/10.1007/s10115-018-1273-z

Download citation

Received: 17 May 2017
Revised: 21 December 2017
Accepted: 16 June 2018
Published: 26 September 2018
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s10115-018-1273-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collective entity linking: a random walk-based perspective

Abstract

Access this article

Similar content being viewed by others

A Novel Path-Based Entity Relatedness Measure for Efficient Collective Entity Linking

Graph-ranking collective Chinese entity linking algorithm

Collective Entity Linking Based on DBpedia

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 3.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Collective entity linking: a random walk-based perspective

Abstract

Access this article

Similar content being viewed by others

A Novel Path-Based Entity Relatedness Measure for Efficient Collective Entity Linking

Graph-ranking collective Chinese entity linking algorithm

Collective Entity Linking Based on DBpedia

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 3.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation