research-article

Incorporating Social Context and Domain Knowledge for Entity Recognition

Authors:

Jimeng SunAuthors Info & Claims

WWW '15: Proceedings of the 24th International Conference on World Wide Web

Pages 517 - 526

https://doi.org/10.1145/2736277.2741135

Published: 18 May 2015 Publication History

Abstract

Recognizing entity instances in documents according to a knowledge base is a fundamental problem in many data mining applications. The problem is extremely challenging for short documents in complex domains such as social media and biomedical domains. Large concept spaces and instance ambiguity are key issues that need to be addressed. Most of the documents are created in a social context by common authors via social interactions, such as reply and citations. Such social contexts are largely ignored in the instance-recognition literature. How can users' interactions help entity instance recognition? How can the social context be modeled so as to resolve the ambiguity of different instances?

In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. Given a set of short documents (e.g., tweets or paper abstracts) posted by users who may connect with each other, SOCINST can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance. The model is also able to incorporate social relationships between users to help build social context. We further incorporate domain knowledge into the model using a Dirichlet tree distribution.

We evaluate the proposed model on three different genres of datasets: ICDM'12 Contest, Weibo, and I2B2. In ICDM'12 Contest, the proposed model clearly outperforms (+21.4%; $p l 1e-5 with t-test) all the top contestants. In Weibo and I2B2, our results also show that the recognition accuracy of SOCINST is up to 5.3-26.6% better than those of several alternative methods.

References

[1]

D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In ICML'09, pages 25--32, 2009.

Digital Library

[2]

X. Bai, F. P. Junqueira, and S. H. Sengamedu. Exploiting user clicks for automatic seed set generation for entity matching. In KDD'13, pages 980--988, 2013.

Digital Library

[3]

K. Bellare, S. Iyengar, A. G. Parameswaran, and V. Rastogi. Active sampling for entity matching. In KDD'12, pages 1131--1139, 2012.

Digital Library

[4]

I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1):1--36, March 2007.

Digital Library

[5]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.

Digital Library

[6]

C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR'2004, pages 25--32, 2004.

Digital Library

[7]

W. Buntine and A. Jakulin. Applying discrete pca in data analysis. In UAI'04, pages 59--66, 2004.

Digital Library

[8]

L. Chiticariu, R. Krishnamurthy, Y. Li, F. Reiss, and S. Vaithyanathan. Domain adaptation of rule-based annotators for named-entity recognition tasks. In EMNLP'10, pages 1002--1012, 2010.

Digital Library

[9]

M. Collins. Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In ACL'02, pages 489--496, 2002.

Digital Library

[10]

M. Dean, G. Schreiber, S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. Stein. Owl web ontology language reference. w3c recommendation., Feb. 2004.

[11]

S. Y. Dennis. On the hyper-dirichlet type 1 and hyper-liouville distributions. Communications in Statistics - Theory and Methods, 20:4069--4081, 1991.

[12]

A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In UAI'00, pages 176--183, 2000.

Digital Library

[13]

J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL'05, pages 363--370, 2005.

Digital Library

[14]

G. Heinrich. Parameter estimation for text analysis. Technical report, University of Leipzig, Germany, 2004.

[15]

T. Hofmann. Probabilistic latent semantic indexing. In SIGIR'99, pages 50--57, 1999.

Digital Library

[16]

Y. Hu, J. Boyd-Graber, and B. Satinoff. Interactive topic modeling. In HLT'11, pages 248--257, 2011.

Digital Library

[17]

H. Huang, Z. Wen, D. Yu, H. Ji, Y. Sun, J. Han, and H. Li. Resolving entity morphs in censored data. In ACL'13, pages 1083--1093, 2013.

[18]

S. Kataria, K. S. Kumar, R. Rastogi, P. Sen, and S. H. Sengamedu. Entity disambiguation with hierarchical topic models. In KDD'11, pages 1037--1045, 2011.

Digital Library

[19]

J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML'01, pages 282--289, 2001.

Digital Library

[20]

C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. Twiner: Named entity recognition in targeted twitter stream. In SIGIR'12, pages 721--730, 2012.

Digital Library

[21]

J. Li, J. Tang, Y. Li, and Q. Luo. Rimom: A dynamic multi-strategy ontology alignment framework. IEEE TKDE, 21(8):1218--1232, 2009.

Digital Library

[22]

Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan. Mining evidences for named entity disambiguation. In KDD'13, pages 1070--1078, 2013.

Digital Library

[23]

X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing named entities in tweets. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, ACL '11, pages 359--367, 2011.

Digital Library

[24]

A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.

[25]

D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30:3--26, 2007.

[26]

D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In EMNLP '09, pages 248--256, 2009.

Digital Library

[27]

A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity recognition in tweets: An experimental study. In EMNLP'11, pages 1524--1534, 2011.

Digital Library

[28]

M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI'04, pages 487--494, 2004.

Digital Library

[29]

W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In KDD'13, pages 68--76, 2013.

Digital Library

[30]

M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD'04, pages 306--315, 2004.

Digital Library

[31]

Y.-C. Tam and T. Schultz. Correlated latent semantic model for unsupervised lm adaptation. In ICASSP'07, volume 4, pages IV--41--IV--44, 2007.

[32]

J. Tang, A. Fong, B. Wang, and J. Zhang. A unified probabilistic framework for name disambiguation in digital library. IEEE TKDE, 24(6):975--987, 2012.

Digital Library

[33]

J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD'09, pages 807--816, 2009.

Digital Library

[34]

J. Tang, S. Wu, J. Sun, and H. Su. Cross-domain collaboration recommendation. In KDD'12, pages 1285--1294, 2012.

Digital Library

[35]

J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990--998, 2008.

Digital Library

[36]

K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271--289, 1999.

[37]

O. Uzuner, Y. Juo, and P. Szolovits. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc, 14(5):550--563, 2007.

[38]

S. Wu, Z. Fang, and J. Tang. Accurate product name recognition from user generated content. In ICDM 2012 Contest, pages 874--877, 2012.

Digital Library

Cited By

Vieira HSilva ACalado Pde Moura E(2022)A distantly supervised approach for recognizing product mentions in user-generated contentJournal of Intelligent Information Systems10.1007/s10844-022-00718-459:3(543-566)Online publication date: 27-May-2022
https://doi.org/10.1007/s10844-022-00718-4
Berkani L(2020)A semantic and social‐based collaborative recommendation of friends in social networksSoftware: Practice and Experience10.1002/spe.282850:8(1498-1519)Online publication date: 3-Apr-2020
https://doi.org/10.1002/spe.2828
Hou LLi JLi XTang JGuo X(2017)Learning to Align Comments to News TopicsACM Transactions on Information Systems10.1145/307259136:1(1-31)Online publication date: 17-Jul-2017
https://dl.acm.org/doi/10.1145/3072591
Show More Cited By

Index Terms

Incorporating Social Context and Domain Knowledge for Entity Recognition
1. Applied computing
  1. Law, social and behavioral sciences
    1. Sociology

Recommendations

Named Entity Recognition with Context-Aware Dictionary Knowledge
Chinese Computational Linguistics
Abstract
Named entity recognition (NER) is an important task in the natural language processing field. Existing NER methods heavily rely on labeled data for model training, and their performance on rare entities is usually unsatisfactory. Entity ...
Protein/Gene Entity Recognition and Normalization with Domain Knowledge and Local Context
Chinese Lexical Semantics
Abstract
Biomedical named entity recognition and normalization aim at recognizing biomedical entity mentions from text and mapping them to their unique database entity identifiers (IDs), which are the primary task of biomedical text mining. However, name ...
Incorporating global information into named entity recognition systems using relational context
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

The state-of-the-art in Named Entity Recognition relies on a combination of local features of the text and global knowledge to determine the types of the recognized entities. This is problematic in some cases, resulting in entities being classified as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '15: Proceedings of the 24th International Conference on World Wide Web

May 2015

1460 pages

ISBN:9781450334693

General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy

Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Hightech R&D Program
Natural Science Foundation of China

Conference

WWW '15

Sponsor:

IW3C2

WWW '15: 24th International World Wide Web Conference

May 18 - 22, 2015

Florence, Italy

Acceptance Rates

WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
341
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vieira HSilva ACalado Pde Moura E(2022)A distantly supervised approach for recognizing product mentions in user-generated contentJournal of Intelligent Information Systems10.1007/s10844-022-00718-459:3(543-566)Online publication date: 27-May-2022
https://doi.org/10.1007/s10844-022-00718-4
Berkani L(2020)A semantic and social‐based collaborative recommendation of friends in social networksSoftware: Practice and Experience10.1002/spe.282850:8(1498-1519)Online publication date: 3-Apr-2020
https://doi.org/10.1002/spe.2828
Hou LLi JLi XTang JGuo X(2017)Learning to Align Comments to News TopicsACM Transactions on Information Systems10.1145/307259136:1(1-31)Online publication date: 17-Jul-2017
https://dl.acm.org/doi/10.1145/3072591
Zhang YTu ZWang Q(2017)TempoRecMobile Networks and Applications10.1007/s11036-017-0864-322:6(1182-1191)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1007/s11036-017-0864-3
Yang ZTang JCohen W(2016)Multi-modal Bayesian embeddings for learning social knowledge graphsProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3060941(2287-2293)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3060832.3060941
Zheng VChang KMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983860(1029-1038)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983860
Tang JBourdeau JHendler JNkambou RHorrocks IZhao B(2016)AMinerProceedings of the 25th International Conference Companion on World Wide Web10.1145/2872518.2890513(373-373)Online publication date: 11-Apr-2016
https://dl.acm.org/doi/10.1145/2872518.2890513

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten