Skip to main content
Log in

Collective entity linking: a random walk-based perspective

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Facing the large amount of name mentions appearing on the web, entity linking turns to be a hot researching topic recently, in which an entity in a resource is assigned to one name mention to help users grasp the meaning of this name mention. Unfortunately, like word disambiguation, one name mention can refer to several entities without considering its context. Apparently, the name mentions that usually co-occur are related and can be considered together to determine their suitable entities. This approach is called collective entity linking and is often conducted based on entity graph. However, traditional collective entity linking methods either consume much time due to the large scale of entity graph or obtain low accuracy due to simplifying graph to boost speed. To improve both accuracy and efficiency, this paper proposes a novel collective entity linking algorithm. It constructs a complete entity graph by connecting any two related entities, and the relationship between two entities is measured via a random walk-based calculating way. After that the relationships between entities are modeled as a relationship matrix, and a hill-climbing-based algorithm is proposed to change entity linking task to a sub-matrix searching problem. Experimental results demonstrate that our linking algorithm can obtain both accurate linking results and low running time meanwhile.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Meij E, Balog K, Odijk D (2013) Entity linking and retrieval. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 1127–1127

  2. Shen W, Wang JY, Han JW (2015) Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans Knowl Data Eng 27:443–460

    Article  Google Scholar 

  3. He ZY, Liu SJ, Song Y, Li M, Zhou M, Wang HF (2013) Efficient collective entity linking with stacking. In: Proceedings of the 2013 conference on empirical methods in natural language processing. ACL, Stroudsburg, pp 426–435

  4. Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150

    Article  MathSciNet  MATH  Google Scholar 

  5. Sen P (2012) Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st international conference on World Wide Web. ACM, New York, pp 729–738

  6. Namata GM, Kok S, Getoor L (2011) Collective graph identification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 87–95

  7. Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems. ACM, New York, pp 121–124

  8. Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering and maintaining links on the web of data. In: Proceedings of the 8th international semantic web conference, SWSA, Karlsruhe, Germany, pp 650–665

  9. Ngomo A-CN, Auer S (2011) LIMES: a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the 22nd international joint conference on artificial intelligence. AAAI, Palo Alto, pp 2312–2317

  10. Benjelloun O, Garcia-Molina H, Menestrina D, Su Q, Whang SE, Widom J (2009) Swoosh: a generic approach to entity resolution. Int J Very Large Data Bases 18:255–276

    Article  Google Scholar 

  11. Blanco R, Ottaviano G, Meij E (2015) Fast and space-efficient entity linking in queries. In: Proceedings of the 8th ACM international conference on web search and data mining. ACM, New York, pp 179–188

  12. Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) KORE: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, New York, pp 545–554

  13. Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 conference on empirical methods in natural language processing and computational natural language learning. ACL, Prague, pp 708–716

  14. Shen W, Han JW, Wang JY (2014) A probabilistic model for linking named entities in web text with heterogeneous information networks. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, New York, pp 1199–1210

  15. He ZY, Liu SJ, Li M, Zhou M, Zhang LK, Wang HF (2013) Learning entity representation for entity disambiguation. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics. ACL, Stroudsburg, pp 30–34

  16. Zheng ZC, Li FT, Huang ML, Zhu XY (2010) Learning to link entities with knowledge base. In: Proceedings of the 2010 annual conference of the North American chapter of the ACL. ACL, Stroudsburg, pp 483–491

  17. Zuo Z, Kasneci G, Grütze T, Naumann F (2014) BEL: bagging for entity linking. In: Proceedings of the 25th international conference on computational linguistics. ICCL, Dublin, pp 2075–2086

  18. Han XP, Sun L (2011) A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics. ACL, Portland, pp 945–954

  19. Bunescu RC, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11st conference of the European chapter of the Association for Computational Linguistics. ACL, Trento, pp 9–16

  20. Han XP, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 765–774

  21. Usbeck R, Ngomo A-CN, Röder M, Gerber D, Coelho SA, Auer S, Both A (2014) AGDISTIS-Agnostic disambiguation of named entities using linked open data. In: Proceedings of the 21st European conference on artificial intelligence. IOS, Amsterdam, pp 1113–1114

  22. Hachey B, Radford W, Curran JR (2011) Graph-based named entity linking with Wikipedia. In: Proceedings of the 12th international conference on web information system engineering. Springer, Berlin, pp 213–226

  23. Ceccarelli D, Lucchese C, Orlando S, Perego R, Trani S (2013) Learning relatedness measures for entity linking. In: Proceedings of the 22nd ACM international conference on information and knowledge management. ACM, New York, pp 139–148

  24. Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing. ACL, Edinburgh, pp 782–792

  25. Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 457–466

  26. Guo M, Liu Y, Li J, Li H, Xu B (2014) A knowledge based approach for tackling mislabeled multi-class big social data. In: Proceedings of the 11th semantic web: trends and challenges. Springer, Anissaras, pp 349–363

  27. Tang M, Agrawal P, Nie F, Pongpaichet S, Jain R (2016) A graph based multimodal geospatial interpolation framework. In: Proceedings of the 2016 IEEE international conference on multimedia and expo. IEEE, Seattle, pp 1–6

  28. Tang M, Nie F, Jain R (2016) Capped LP-norm graph embedding for photo clustering. In: Proceedings of the 2016 ACM on multimedia conference. ACM, Amsterdam, pp 431–435

  29. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  30. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

  31. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) LINE: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web. ACM, Florence, pp 1067–1077

  32. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 701–710

  33. Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Francisco, pp 1225–1234

  34. Nguyen DB, Hoffart J, Theobald M, Weikum G (2014) AIDA-light: high-throughput named-entity disambiguation. In: Linked data on the web, WWW, Seoul

  35. Ratinov L, Roth D, Downey D, Anderson M (2015) Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics. ACL, Portland, pp 1375–1384

  36. Ristad ES, Yianilos PN (1998) Learning string-edit distance. IEEE Trans Pattern Anal Mach Intell 20:522–532

    Article  Google Scholar 

  37. Cohen S (2013) Indexing for subtree similarity-search using edit distance. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 49–60

  38. Guo YH, Che WX, Liu T, Li S (2011) A graph-based method for entity linking. In: Proceedings of the 5th international joint conference on natural language processing. ACL, Stroudsburg, pp 1010–1018

  39. Wang W, Xiao C, Lin XM, Zhang CQ (2009) Efficient approximate entity extraction with edit distance constraints. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data. ACM, Providence, pp 759–770

  40. Kim HM, Biehl M (2005) Exploiting the small-worlds of the semantic web to connect heterogeneous, local ontologies. Inf Technol Manag 6:89–96

    Article  Google Scholar 

  41. Ahn J, Kim K (2012) Lower bound on expected complexity of depth-first tree search with multiple radii. IEEE Commun Lett 16:805–808

    Article  Google Scholar 

  42. Langville AN, Meyer CD (2006) Updating markov chains with an eye on Google’s PageRank. SIAM J Matrix Anal Appl 27:968–987

    Article  MathSciNet  MATH  Google Scholar 

  43. Hoffart J, Altun Y, Weikum G (2014) Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd international conference on World Wide Web. ACM, New York, pp 385–396

  44. Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing. ACL, Edinburgh, pp 782–792

  45. Ji H, Nothman J, Hachey B (2014) Overview of TAC-KBP2014 entity discovery and linking tasks. In: Proceedings of the text analysis conference, NIST, Gaithersburg, pp 1–15

  46. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1, Article 2

  47. Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. ACM, Chicago, pp 177–187

  48. McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Proceedings of the 2012 advances in neural information processing systems, Lake Tahoe, Nevada, USA, pp 1–9

Download references

Acknowledgements

We thank Dr. Lei Chen’s enthusiastic and selfless help to re-implement many baseline algorithms. Furthermore, we thank the anonymous reviewers for providing us many helpful and insightful advices to help us revise our paper. This work is supported by National Natural Science Foundation of China (Nos. 61632011, 61772156, and 61702137), Microsoft Research Asia, CCF-Tencent Open Fund (No. CCF-TencentIAGR20160109), HIT-Tencent (No. AGR201601), and 2015 Guangdong Provincial Key Platform Project-the Youth Innovative Talent Funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanyan Zhao.

Appendix

Appendix

Lemma 3.1

\( E\left( {\overline{{\pi_{ij} }} } \right) = \overrightarrow {{P\left( {i,j} \right)}} \), where\( \overline{{\pi_{ij} }} \)denotes the number of visits on node j by the random walks that all start from node i divided by the number of visits on all the nodes.\( \overrightarrow {{P\left( {i,j} \right)}} \)denotes the probability that a random walk starts from one node i and ends at another node j.

Proof

Let A denote the transition matrix. \( \overrightarrow {{P\left( {i,j} \right)}} \) can be calculated by

$$ \overrightarrow {{P\left( {i,j} \right)}} = \frac{{\sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } }}{l} = \frac{{\left[ {A_{{}}^{1} } \right]_{ij} + \left[ {A_{{}}^{2} } \right]_{ij} + \left[ {A_{{}}^{3} } \right]_{ij} + \cdots + \left[ {A_{{}}^{l} } \right]_{ij} }}{l} $$
(19)

where l denotes the length of path between two nodes i and j. Since we limit the max length of path between two candidates as 5, l is at most 5. Aij denotes the entry located at the ith row and jth column in A. Since Aij is the transition probability from node i to node j, as indicated in [42] A d ij is the probability that a random walk starts from node i and visits node j via d steps.

Let VIij denote the estimation of the number of visits on node j by a random walk that starts from node i. The expectation of VIij is

$$ E\left( {VI_{ij} } \right) = \sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } = \left[ {A_{{}}^{1} } \right]_{ij} + \left[ {A_{{}}^{2} } \right]_{ij} + \left[ {A_{{}}^{3} } \right]_{ij} + \cdots + \left[ {A_{{}}^{l} } \right]_{ij} $$
(20)

Using VIij, the estimation in Eq. 13 can be rewritten as

$$ \overline{{\pi_{ij} }} = \frac{{\sum\nolimits_{t = 1}^{m} {VI_{ij} \left( t \right)} }}{ml} $$
(21)

where VIij(t) denotes the number of visits on node j by a random walk that starts from node i at tth time.

Due to the fact that the random walks starting from different nodes are independent on each other and the random walks starting from the same node at different times are also independent on each other, we have

$$ E\left( {\overline{{\pi_{ij} }} } \right) = E\left( {\frac{{\sum\nolimits_{t = 1}^{m} {VI_{ij} \left( t \right)} }}{ml}} \right) = \frac{1}{ml}\sum\limits_{t = 1}^{m} {\left( {E\left( {VI_{ij} } \right)} \right)} $$
(22)

From Eqs. 20 and 22, we have

$$ E\left( {\overline{{\pi_{ij} }} } \right) = \frac{1}{ml}\sum\limits_{t = 1}^{m} {\left( {E\left( {VI_{ij} } \right)} \right)} = \frac{1}{ml}\sum\limits_{t = 1}^{m} {\left( {\sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } } \right)} = \frac{1}{l}\sum\limits_{d = 1}^{l} {\left[ {A_{{}}^{d} } \right]_{ij} } $$
(23)

After importing Eq. 19 in Eq. 23, it is easy to obtain that

$$ E\left( {\overline{{\pi_{ij} }} } \right) = \overrightarrow {{P\left( {i,j} \right)}} $$
(24)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, M., Zhao, Y., Qin, B. et al. Collective entity linking: a random walk-based perspective. Knowl Inf Syst 60, 1611–1643 (2019). https://doi.org/10.1007/s10115-018-1273-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1273-z

Keywords

Navigation