skip to main content
10.1145/3018661.3018724acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Lightweight Multilingual Entity Extraction and Linking

Published: 02 February 2017 Publication History

Abstract

Text analytics systems often rely heavily on detecting and linking entity mentions in documents to knowledge bases for downstream applications such as sentiment analysis, question answering and recommender systems. A major challenge for this task is to be able to accurately detect entities in new languages with limited labeled resources. In this paper we present an accurate and lightweight, multilingual named entity recognition (NER) and linking (NEL) system. The contributions of this paper are three-fold: 1) Lightweight named entity recognition with competitive accuracy; 2) Candidate entity retrieval that uses search click-log data and entity embeddings to achieve high precision with a low memory footprint; and 3) efficient entity disambiguation. Our system achieves state-of-the-art performance on TAC KBP 2013 multilingual data and on English AIDA CONLL data.

References

[1]
R. Al-Rfou, V. Kulkarni, B. Perozzi, and S. Skiena. Polyglot-NER: Massive multilingual named entity recognition. In Proc. ICDM, 2015.
[2]
A. Alhelbawy and R. Gaizauskas. Collective named entity disambiguation using graph ranking and clique partitioning approaches. In Proc. COLING, 2014.
[3]
S. Austin, R. Schwartz, and P. Placeway. The forward-backward search algorithm. In Proc. ICASSP, 1991.
[4]
R. Blanco, G. Ottaviano, and E. Meij. Fast and space-efficient entity linking for queries. In Proc. WSDM, 2015.
[5]
R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In Proc. EACL, 2006.
[6]
D. Ceccarelli et al. Learning relatedness measures for entity linking. In Proc. CIKM, 2013.
[7]
W. Che, M. Wang, C. D. Manning, and T. Liu. Named entity recognition with bilingual constraints. In Proc. HLT-NAACL, 2013.
[8]
X. Cheng and D. Roth. Relational inference for wikification. In Proc. EMNLP, 2013.
[9]
A. Chisholm and B. Hachey. Entity disambiguation with web links. Trans. of the ACL, 3:145--156, 2015.
[10]
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In Proc. EMNLP, 2007.
[11]
B. Dalvi, E. Minkov, P. Talukdar, and W. Cohen. Automatic gloss finding for a knowledge base using ontological constraints. In Proc. WSDM, 2015.
[12]
N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, and N. Bhamidipati. Hierarchical neural language models for joint representation of streaming documents and their content. In Proceedings of the 24th Internation Conference on World Wide Web, WWW '15, pages 248--255, Republic and Canton of Geneva, Switzerland, 2015. International World Wide Web Conferences Steering Committee.
[13]
G. Durrett and D. Klein. A joint model for entity analysis: Coreference, typing, and linking. Trans. Of the ACL, 2:477--490, 2014.
[14]
P. Elias. Efficient storage and retrieval by content and address of static les. Journal of the ACM, 21(2):246--260, 1974.
[15]
A. Fahrni, B. Heinzerling, T. Göckel, and M. Strube. HITS' monolingual and cross-lingual entity linking system at TAC 2013. In Proc. TAC, 2013.
[16]
N. Fernandez Garcia, J. Arias Fisteus, and L. Sanchez Fernandez. Comparative evaluation of link-based approaches for candidate ranking in link-to-wikipedia systems. Journal of Artificial Intelligence Research, 49:733--773, 2014.
[17]
J. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proc. ACL, 2005.
[18]
B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972--976, 2007.
[19]
O.-E. Ganea et al. Probabilistic bag-of-hyperlinks model for entity linking. In Proc. WWW, 2016.
[20]
Z. Guo and D. Barbosa. Robust entity linking via random walks. In Proc. CIKM, 2014.
[21]
B. Hachey, W. Radford, and J. R. Curran. Graph-based named entity linking with Wikipedia. In Proc. WISE, 2011.
[22]
D. Hakkani-Tür et al. Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding. In Proc. INTERSPEECH, 2014.
[23]
X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In Proc. SIGIR, 2011.
[24]
Z. He et al. Learning entity representation for entity disambiguation. In Proc. ACL, 2013.
[25]
J. Ho art et al. Robust disambiguation of named entities in text. In Proc. EMNLP, 2011.
[26]
H. Ji, J. Nothman, and B. Hachey. Overview of\ TAC-KBP2014 entity discovery and linking tasks. In Proc. TAC, 2014.
[27]
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In Proc. KDD, 2009.
[28]
J. La erty, A. McCallum, and F. Pereira. Conditional random elds: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML, 2001.
[29]
G. Lample et al. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360, 2016.
[30]
Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proc. ICML, 2014.
[31]
X. Ling, S. Singh, and D. Weld. Design challenges for entity linking. Trans. of the ACL, 3:315--328, 2015.
[32]
G. Luo, X. Huang, C.-Y. Lin, and Z. Nie. Joint named entity recognition and disambiguation. In Proc. EMNLP, 2015.
[33]
X. Ma and E. Hovy. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354, 2016.
[34]
E. Meij, K. Balog, and D. Odijk. Entity linking and retrieval tutorial. http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/, 2014.
[35]
Y. Merhav et al. Basis Technology at TAC 2013 entity linking. In Proc. TAC, 2013.
[36]
T. Mikolov et al. Distributed representations of words and phrases and their compositionality. In Proc. NIPS, 2013.
[37]
N. Okazaki. CRFsuite: a fast implementation of conditional random elds (CRFs). http://www.chokkan.org/software/crfsuite/, 2007.
[38]
N. Okazaki and J. Nocedal. Liblbfgs: a library of limited-memory broyden- etcher-goldfarb-shanno (l-bfgs). URL http://www.chokkan.org/software/liblbfgs, 2010.
[39]
A. Passos, V. Kumar, and A. McCallum. Lexicon infused phrase embeddings for named entity resolution. arXiv preprint arXiv:1404.5367, 2014.
[40]
F. Pedregosa et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[41]
D. Rao, P. McNamee, and M. Dredze. Entity linking: Finding extracted entities in a knowledge base. In Multi-source, Multilingual Information Extraction and Summarization, pages 93--115. Springer, 2013.
[42]
L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Proc. CoNLL, 2009.
[43]
D. Roth, H. Ji, M.-W. Chang, and T. Cassidy. Wiki cation and beyond: The challenges of entity and concept grounding. Proc. ACL, 2014.
[44]
W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In Proc. KDD, 2013.
[45]
M. Shirakawa et al. Entity disambiguation based on a probabilistic taxonomy. Technical Report MSR-TR-2011-125, Microsoft Research, 2011.
[46]
A. Sil and A. Yates. Re-ranking for joint named-entity recognition and linking. In Proc. CIKM, 2013.
[47]
M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge. Twitter polarity classi cation with label propagation over lexical links and the follower graph. In Proc. EMNLP, 2011.
[48]
J. Suzuki and H. Isozaki. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In Proc. ACL-HLT, 2008.
[49]
P. P. Talukdar and K. Crammer. New regularized algorithms for transductive learning. In Proc. ECML PKDD, 2009.
[50]
E. Tjong Kim Sang and F. De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proc. HLT-NAACL, 2003.
[51]
M. Yu, S. Wang, C. Zhu, and T. Zhao. Semi-supervised learning for word sense disambiguation using parallel corpora. In Proc. FSKD, 2011.
[52]
Y. Zhou et al. Resolving surface forms to Wikipedia topics. In Proc. COLING, 2010.
[53]
Z. Zuo, G. Kasneci, T. Gruetze, and F. Naumann. BEL: Bagging for entity linking. In Proc. COLING, 2014.
[54]
E. F. Tjong Kim Sang. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In Proc. CoNLL, 2002.

Cited By

View all
  • (2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
  • (2023)Intelligent Document Processing in End-to-End RPA Contexts: A Systematic Literature ReviewConfluence of Artificial Intelligence and Robotic Process Automation10.1007/978-981-19-8296-5_5(95-131)Online publication date: 14-Mar-2023
  • (2022)KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956191(606-612)Online publication date: 21-Aug-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering entities
  2. document processing
  3. entity extraction
  4. entity linking
  5. natural language processing
  6. unsupervised learning

Qualifiers

  • Research-article

Conference

WSDM 2017

Acceptance Rates

WSDM '17 Paper Acceptance Rate 80 of 505 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Purely Entity-Based Semantic Search Approach for Document RetrievalApplied Sciences10.3390/app13181028513:18(10285)Online publication date: 14-Sep-2023
  • (2023)Intelligent Document Processing in End-to-End RPA Contexts: A Systematic Literature ReviewConfluence of Artificial Intelligence and Robotic Process Automation10.1007/978-981-19-8296-5_5(95-131)Online publication date: 14-Mar-2023
  • (2022)KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956191(606-612)Online publication date: 21-Aug-2022
  • (2022)KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA55696.2022.00254(1654-1659)Online publication date: Dec-2022
  • (2022)Named Entity Disambiguation Based on Bidirectional Semantic PathArtificial Intelligence in China10.1007/978-981-16-9423-3_55(440-447)Online publication date: 22-Mar-2022
  • (2021)TENET: Joint Entity and Relation Linking with Coherence RelaxationProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457280(1142-1155)Online publication date: 9-Jun-2021
  • (2021)Mining Domain-specific Component-Action Links for Technical Support DocumentsProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3431000(323-331)Online publication date: 2-Jan-2021
  • (2021)Identifying Salient Entities of News Articles Using Binary Salient Classifier2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671567(1541-1549)Online publication date: 15-Dec-2021
  • (2020)REL: An Entity Linker Standing on the Shoulders of GiantsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401416(2197-2200)Online publication date: 25-Jul-2020
  • (2020)Query Understanding for Surfacing Under-served Music ContentProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412741(2765-2772)Online publication date: 19-Oct-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media