research-article

Lightweight Multilingual Entity Extraction and Linking

Authors:

Kapil ThadaniAuthors Info & Claims

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 365 - 374

https://doi.org/10.1145/3018661.3018724

Published: 02 February 2017 Publication History

Abstract

Text analytics systems often rely heavily on detecting and linking entity mentions in documents to knowledge bases for downstream applications such as sentiment analysis, question answering and recommender systems. A major challenge for this task is to be able to accurately detect entities in new languages with limited labeled resources. In this paper we present an accurate and lightweight, multilingual named entity recognition (NER) and linking (NEL) system. The contributions of this paper are three-fold: 1) Lightweight named entity recognition with competitive accuracy; 2) Candidate entity retrieval that uses search click-log data and entity embeddings to achieve high precision with a low memory footprint; and 3) efficient entity disambiguation. Our system achieves state-of-the-art performance on TAC KBP 2013 multilingual data and on English AIDA CONLL data.

References

[1]

R. Al-Rfou, V. Kulkarni, B. Perozzi, and S. Skiena. Polyglot-NER: Massive multilingual named entity recognition. In Proc. ICDM, 2015.

[2]

A. Alhelbawy and R. Gaizauskas. Collective named entity disambiguation using graph ranking and clique partitioning approaches. In Proc. COLING, 2014.

[3]

S. Austin, R. Schwartz, and P. Placeway. The forward-backward search algorithm. In Proc. ICASSP, 1991.

[4]

R. Blanco, G. Ottaviano, and E. Meij. Fast and space-efficient entity linking for queries. In Proc. WSDM, 2015.

Digital Library

[5]

R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proc. EACL, 2006.

[6]

D. Ceccarelli et al. Learning relatedness measures for entity linking. In Proc. CIKM, 2013.

Digital Library

[7]

W. Che, M. Wang, C. D. Manning, and T. Liu. Named entity recognition with bilingual constraints. In Proc. HLT-NAACL, 2013.

[8]

X. Cheng and D. Roth. Relational inference for wikification. In Proc. EMNLP, 2013.

[9]

A. Chisholm and B. Hachey. Entity disambiguation with web links. Trans. of the ACL, 3:145--156, 2015.

[10]

S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. EMNLP, 2007.

[11]

B. Dalvi, E. Minkov, P. Talukdar, and W. Cohen. Automatic gloss finding for a knowledge base using ontological constraints. In Proc. WSDM, 2015.

Digital Library

[12]

N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In Proceedings of the 24th Internation Conference on World Wide Web, WWW '15, pages 248--255, Republic and Canton of Geneva, Switzerland, 2015. International World Wide Web Conferences Steering Committee.

Digital Library

[13]

G. Durrett and D. Klein. A joint model for entity analysis: Coreference, typing, and linking. Trans. Of the ACL, 2:477--490, 2014.

[14]

P. Elias. Efficient storage and retrieval by content and address of static les. Journal of the ACM, 21(2):246--260, 1974.

Digital Library

[15]

A. Fahrni, B. Heinzerling, T. Göckel, and M. Strube. HITS' monolingual and cross-lingual entity linking system at TAC 2013. In Proc. TAC, 2013.

[16]

N. Fernandez Garcia, J. Arias Fisteus, and L. Sanchez Fernandez. Comparative evaluation of link-based approaches for candidate ranking in link-to-wikipedia systems. Journal of Artificial Intelligence Research, 49:733--773, 2014.

[17]

J. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proc. ACL, 2005.

Digital Library

[18]

B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972--976, 2007.

[19]

O.-E. Ganea et al. Probabilistic bag-of-hyperlinks model for entity linking. In Proc. WWW, 2016.

Digital Library

[20]

Z. Guo and D. Barbosa. Robust entity linking via random walks. In Proc. CIKM, 2014.

Digital Library

[21]

B. Hachey, W. Radford, and J. R. Curran. Graph-based named entity linking with Wikipedia. In Proc. WISE, 2011.

[22]

D. Hakkani-Tür et al. Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding. In Proc. INTERSPEECH, 2014.

[23]

X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In Proc. SIGIR, 2011.

Digital Library

[24]

Z. He et al. Learning entity representation for entity disambiguation. In Proc. ACL, 2013.

[25]

J. Ho art et al. Robust disambiguation of named entities in text. In Proc. EMNLP, 2011.

[26]

H. Ji, J. Nothman, and B. Hachey. Overview of\ TAC-KBP2014 entity discovery and linking tasks. In Proc. TAC, 2014.

[27]

S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In Proc. KDD, 2009.

Digital Library

[28]

J. La erty, A. McCallum, and F. Pereira. Conditional random elds: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML, 2001.

[29]

G. Lample et al. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360, 2016.

[30]

Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proc. ICML, 2014.

[31]

X. Ling, S. Singh, and D. Weld. Design challenges for entity linking. Trans. of the ACL, 3:315--328, 2015.

[32]

G. Luo, X. Huang, C.-Y. Lin, and Z. Nie. Joint named entity recognition and disambiguation. In Proc. EMNLP, 2015.

[33]

X. Ma and E. Hovy. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354, 2016.

[34]

E. Meij, K. Balog, and D. Odijk. Entity linking and retrieval tutorial. http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/, 2014.

[35]

Y. Merhav et al. Basis Technology at TAC 2013 entity linking. In Proc. TAC, 2013.

[36]

T. Mikolov et al. Distributed representations of words and phrases and their compositionality. In Proc. NIPS, 2013.

Digital Library

[37]

N. Okazaki. CRFsuite: a fast implementation of conditional random elds (CRFs). http://www.chokkan.org/software/crfsuite/, 2007.

[38]

N. Okazaki and J. Nocedal. Liblbfgs: a library of limited-memory broyden- etcher-goldfarb-shanno (l-bfgs). URL http://www.chokkan.org/software/liblbfgs, 2010.

[39]

A. Passos, V. Kumar, and A. McCallum. Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367, 2014.

[40]

F. Pedregosa et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.

Digital Library

[41]

D. Rao, P. McNamee, and M. Dredze. Entity linking: Finding extracted entities in a knowledge base. In Multi-source, Multilingual Information Extraction and Summarization, pages 93--115. Springer, 2013.

[42]

L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Proc. CoNLL, 2009.

[43]

D. Roth, H. Ji, M.-W. Chang, and T. Cassidy. Wiki cation and beyond: The challenges of entity and concept grounding. Proc. ACL, 2014.

[44]

W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In Proc. KDD, 2013.

Digital Library

[45]

M. Shirakawa et al. Entity disambiguation based on a probabilistic taxonomy. Technical Report MSR-TR-2011-125, Microsoft Research, 2011.

[46]

A. Sil and A. Yates. Re-ranking for joint named-entity recognition and linking. In Proc. CIKM, 2013.

Digital Library

[47]

M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge. Twitter polarity classi cation with label propagation over lexical links and the follower graph. In Proc. EMNLP, 2011.

[48]

J. Suzuki and H. Isozaki. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In Proc. ACL-HLT, 2008.

[49]

P. P. Talukdar and K. Crammer. New regularized algorithms for transductive learning. In Proc. ECML PKDD, 2009.

Digital Library

[50]

E. Tjong Kim Sang and F. De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proc. HLT-NAACL, 2003.

Digital Library

[51]

M. Yu, S. Wang, C. Zhu, and T. Zhao. Semi-supervised learning for word sense disambiguation using parallel corpora. In Proc. FSKD, 2011.

[52]

Y. Zhou et al. Resolving surface forms to Wikipedia topics. In Proc. COLING, 2010.

[53]

Z. Zuo, G. Kasneci, T. Gruetze, and F. Naumann. BEL: Bagging for entity linking. In Proc. COLING, 2014.

Digital Library

[54]

E. F. Tjong Kim Sang. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In Proc. CoNLL, 2002.

Digital Library

Cited By

Sidi MGunal S(2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
https://doi.org/10.3390/app131810285
Martínez-Rojas ALópez-Carnicer JGonzález-Enríquez JJiménez-Ramírez ASánchez-Oliva J(2023)Intelligent Document Processing in End-to-End RPA Contexts: A Systematic Literature ReviewConfluence of Artificial Intelligence and Robotic Process Automation10.1007/978-981-19-8296-5_5(95-131)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-981-19-8296-5_5
Hillebrand LDeuser TDilmaghani TKliem BLoitz RBauckhage CSifa R(2022)KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956191(606-612)Online publication date: 21-Aug-2022
https://doi.org/10.1109/ICPR56361.2022.9956191
Show More Cited By

Index Terms

Lightweight Multilingual Entity Extraction and Linking
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
      2. Information extraction
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction

Recommendations

WeDGeM: A Domain-Specific Evaluation Dataset Generator for Multilingual Entity Linking Systems
Web Information Systems Engineering – WISE 2017
Abstract
Entity Linking is the task to annotate ambiguous mentions in an unstructured text to the referent entities in the given knowledge base. To evaluate these approaches, there are a vast amount of general purpose benchmark datasets. However, it is ...
NERA: Named Entity Recognition for Arabic

Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a ...
First Steps in Czech Entity Linking
TSD 2015: Proceedings of the 18th International Conference on Text, Speech, and Dialogue - Volume 9302

In this paper, we present our approach for a simplified Entity Linking task in Czech, where entity mentions found in text are linked to a list of known entities. We evaluate both known and newly proposed methods for entity names similarity on a manually ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

February 2017

868 pages

ISBN:9781450346757

DOI:10.1145/3018661

General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM 2017

Sponsor:

WSDM 2017: Tenth ACM International Conference on Web Search and Data Mining

February 6 - 10, 2017

Cambridge, United Kingdom

Acceptance Rates

WSDM '17 Paper Acceptance Rate 80 of 505 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
723
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sidi MGunal S(2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
https://doi.org/10.3390/app131810285
Martínez-Rojas ALópez-Carnicer JGonzález-Enríquez JJiménez-Ramírez ASánchez-Oliva J(2023)Intelligent Document Processing in End-to-End RPA Contexts: A Systematic Literature ReviewConfluence of Artificial Intelligence and Robotic Process Automation10.1007/978-981-19-8296-5_5(95-131)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-981-19-8296-5_5
Hillebrand LDeuser TDilmaghani TKliem BLoitz RBauckhage CSifa R(2022)KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956191(606-612)Online publication date: 21-Aug-2022
https://doi.org/10.1109/ICPR56361.2022.9956191
Deußer TAli SHillebrand LNurchalifah DJacob BBauckhage CSifa R(2022)KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00254(1654-1659)Online publication date: Dec-2022
https://doi.org/10.1109/ICMLA55696.2022.00254
Li ZZhang YYin FNie M(2022)Named Entity Disambiguation Based on Bidirectional Semantic PathArtificial Intelligence in China10.1007/978-981-16-9423-3_55(440-447)Online publication date: 22-Mar-2022
https://doi.org/10.1007/978-981-16-9423-3_55
Lin XChen LZhang CLi GLi ZIdreos SSrivastava D(2021)TENET: Joint Entity and Relation Linking with Coherence RelaxationProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457280(1142-1155)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457280
Aggarwal PBansal SMohapatra PKumar A(2021)Mining Domain-specific Component-Action Links for Technical Support DocumentsProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3431000(323-331)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3430984.3431000
Appiktala NHuang SSankar BTripathi SGoldman E(2021)Identifying Salient Entities of News Articles Using Binary Salient Classifier2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671567(1541-1549)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671567
van Hulst JHasibi FDercksen KBalog Kde Vries AHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)REL: An Entity Linker Standing on the Shoulders of GiantsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401416(2197-2200)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401416
Tomasi FMehrotra RPappu ABütepage JBrost BGalvão HLalmas Md'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Query Understanding for Surfacing Under-served Music ContentProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412741(2765-2772)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3412741
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents