skip to main content
10.1145/2588555.2593676acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

A probabilistic model for linking named entities in web text with heterogeneous information networks

Published: 18 June 2014 Publication History

Abstract

Heterogeneous information networks that consist of multi-type, interconnected objects are becoming ubiquitous and increasingly popular, such as social media networks and bibliographic networks. The task to link named entity mentions detected from the unstructured Web text with their corresponding entities existing in a heterogeneous information network is of practical importance for the problem of information network population and enrichment. This task is challenging due to name ambiguity and limited knowledge existing in the information network. Most existing entity linking methods focus on linking entities with Wikipedia or Wikipedia-derived knowledge bases (e.g., YAGO), and are largely dependent on the special features associated with Wikipedia (e.g., Wikipedia articles or Wikipedia-based relatedness measures). Since heterogeneous information networks do not have such features, these previous methods cannot be applied to our task. In this paper, we propose SHINE, the first probabilistic model to link the named entities in Web text with a heterogeneous information network to the best of our knowledge. Our model consists of two components: the entity popularity model that captures the popularity of an entity, and the entity object model that captures the distribution of multi-type objects appearing in the textual context of an entity, which is generated using meta-path constrained random walks over networks. As different meta-paths express diverse semantic meanings and lead to various distributions over objects, different paths have different weights in entity linking. We propose an effective iterative approach to automatically learning the weights for each meta-path based on the expectation-maximization (EM) algorithm without requiring any training data. Experimental results on a real world data set demonstrate the effectiveness and efficiency of our proposed model in comparison with the baselines.

References

[1]
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In NIPS, pages 161--168, 2008.
[2]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, pages 107--117, 1998.
[3]
R. Bunescu and M. Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In EACL, pages 9--16, 2006.
[4]
S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL, pages 708--716.
[5]
N. Dalvi, R. Kumar, and B. Pang. Object matching in tweets with spatial models. In WSDM, pages 43--52, 2012.
[6]
O. Deshpande, D. S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. Building, maintaining, and using knowledge bases: A report from the trenches. In SIGMOD, pages 1209--1220, 2013.
[7]
M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 277--285, 2010.
[8]
A. A. Ferreira, M. A. Gonçalves, and A. H. Laender. A brief survey of automatic methods for author name disambiguation. SIGMOD Rec., 41(2):15--26, 2012.
[9]
T. L. Griffiths and M. Steyvers. Finding scientific topics. National Academy of Sciences, 101, 2004.
[10]
X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In ACL, 2011.
[11]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, 2011.
[12]
H. Ji and R. Grishman. Knowledge base population: successful approaches and challenges. In ACL, pages 1148--1158, 2011.
[13]
P. Kanani, A. McCallum, and C. Pal. Improving author coreference by resource-bounded information gathering from the web. In IJCAI, pages 429--434, 2007.
[14]
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In SIGKDD, pages 457--466, 2009.
[15]
N. Lao and W. W. Cohen. Relational retrieval using a combination of path-constrained random walks. Mach. Learn., 81(1):53--67, Oct. 2010.
[16]
M. Ley. Dblp: some lessons learned. Proc. VLDB Endow., 2(2):1493--1500, Aug. 2009.
[17]
P. Li, X. L. Dong, A. Maurino, and D. Srivastava. Linking temporal records. Proceedings of the VLDB Endowment, 4(11):956--967, Aug. 2011.
[18]
C. D. Manning, P. Raghavan, and H. Schütze, editors. An Introduction to Information Retrieval. Cambridge University Press, 2009.
[19]
D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In WIKIAI, 2008.
[20]
P. Pantel and A. Fuxman. Jigs and lures: associating web queries with structured entities. In ACL, pages 83--92, 2011.
[21]
L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, 2011.
[22]
W. Shen, J. Wang, P. Luo, and M. Wang. Liege: Link entities in web lists with knowledge base. In SIGKDD, pages 1424--1432, 2012.
[23]
W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In WWW, pages 449--458, 2012.
[24]
W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In SIGKDD, pages 68--76, 2013.
[25]
L. Shu, B. Long, and W. Meng. A latent topic model for complete entity resolution. In ICDE, pages 880--891, 2009.
[26]
F. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge unifying wordnet and wikipedia. In WWW, pages 697--706, 2007.
[27]
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB'11.
[28]
Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In SIGKDD, 2012.
[29]
X. Wang, J. Tang, H. Cheng, and P. S. Yu. Adana: Active name disambiguation. In ICDM, pages 794--803, 2011.
[30]
X. Yin, J. Han, and P. S. Yu. Object distinction: Distinguishing objects with identical names. In ICDE, 2007.

Cited By

View all
  • (2024)SRSCL: A strong-relatedness-sequence-based fine-grained collective entity linking method for heterogeneous information networksExpert Systems with Applications10.1016/j.eswa.2023.121759238(121759)Online publication date: Mar-2024
  • (2023)Accelerating large-scale weighted similarity queries based on external storageInformation Systems10.1016/j.is.2023.102213117:COnline publication date: 1-Jul-2023
  • (2022)Toward Tweet Entity Linking With Heterogeneous Information NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306809334:12(6003-6017)Online publication date: 1-Dec-2022
  • Show More Cited By

Index Terms

  1. A probabilistic model for linking named entities in web text with heterogeneous information networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
    June 2014
    1645 pages
    ISBN:9781450323765
    DOI:10.1145/2588555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. domain-specific entity linking
    2. entity linking
    3. heterogeneous information networks

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS'14
    Sponsor:

    Acceptance Rates

    SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SRSCL: A strong-relatedness-sequence-based fine-grained collective entity linking method for heterogeneous information networksExpert Systems with Applications10.1016/j.eswa.2023.121759238(121759)Online publication date: Mar-2024
    • (2023)Accelerating large-scale weighted similarity queries based on external storageInformation Systems10.1016/j.is.2023.102213117:COnline publication date: 1-Jul-2023
    • (2022)Toward Tweet Entity Linking With Heterogeneous Information NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306809334:12(6003-6017)Online publication date: 1-Dec-2022
    • (2022)A Collective Approach to Scholar Name DisambiguationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.301167434:5(2020-2032)Online publication date: 1-May-2022
    • (2021)Online Topic-Aware Entity Resolution Over Incomplete Data StreamsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457238(1478-1490)Online publication date: 9-Jun-2021
    • (2021)Jointly Modeling Fact Triples and Text Information for Knowledge Base Completion2021 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICKG52313.2021.00037(214-221)Online publication date: Dec-2021
    • (2021)A coarse-to-fine collective entity linking method for heterogeneous information networks▪Knowledge-Based Systems10.1016/j.knosys.2021.107286228:COnline publication date: 27-Sep-2021
    • (2021)A supervised and distributed framework for cold-start author disambiguation in large-scale publicationsNeural Computing and Applications10.1007/s00521-020-05684-y35:18(13093-13108)Online publication date: 5-Mar-2021
    • (2020)CerFixProceedings of the VLDB Endowment10.14778/3402755.34027744:12(1375-1378)Online publication date: 3-Jun-2020
    • (2020)IPL-PProceedings of the VLDB Endowment10.14778/3402755.34027714:12(1363-1366)Online publication date: 3-Jun-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media