skip to main content
10.1145/2736277.2741135acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Incorporating Social Context and Domain Knowledge for Entity Recognition

Published: 18 May 2015 Publication History

Abstract

Recognizing entity instances in documents according to a knowledge base is a fundamental problem in many data mining applications. The problem is extremely challenging for short documents in complex domains such as social media and biomedical domains. Large concept spaces and instance ambiguity are key issues that need to be addressed. Most of the documents are created in a social context by common authors via social interactions, such as reply and citations. Such social contexts are largely ignored in the instance-recognition literature. How can users' interactions help entity instance recognition? How can the social context be modeled so as to resolve the ambiguity of different instances?
In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. Given a set of short documents (e.g., tweets or paper abstracts) posted by users who may connect with each other, SOCINST can automatically construct a context of subtopics for each instance, with each subtopic representing one possible meaning of the instance. The model is also able to incorporate social relationships between users to help build social context. We further incorporate domain knowledge into the model using a Dirichlet tree distribution.
We evaluate the proposed model on three different genres of datasets: ICDM'12 Contest, Weibo, and I2B2. In ICDM'12 Contest, the proposed model clearly outperforms (+21.4%; $p l 1e-5 with t-test) all the top contestants. In Weibo and I2B2, our results also show that the recognition accuracy of SOCINST is up to 5.3-26.6% better than those of several alternative methods.

References

[1]
D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In ICML'09, pages 25--32, 2009.
[2]
X. Bai, F. P. Junqueira, and S. H. Sengamedu. Exploiting user clicks for automatic seed set generation for entity matching. In KDD'13, pages 980--988, 2013.
[3]
K. Bellare, S. Iyengar, A. G. Parameswaran, and V. Rastogi. Active sampling for entity matching. In KDD'12, pages 1131--1139, 2012.
[4]
I. Bhattacharya and L. Getoor. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data, 1(1):1--36, March 2007.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
[6]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR'2004, pages 25--32, 2004.
[7]
W. Buntine and A. Jakulin. Applying discrete pca in data analysis. In UAI'04, pages 59--66, 2004.
[8]
L. Chiticariu, R. Krishnamurthy, Y. Li, F. Reiss, and S. Vaithyanathan. Domain adaptation of rule-based annotators for named-entity recognition tasks. In EMNLP'10, pages 1002--1012, 2010.
[9]
M. Collins. Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In ACL'02, pages 489--496, 2002.
[10]
M. Dean, G. Schreiber, S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, and L. A. Stein. Owl web ontology language reference. w3c recommendation., Feb. 2004.
[11]
S. Y. Dennis. On the hyper-dirichlet type 1 and hyper-liouville distributions. Communications in Statistics - Theory and Methods, 20:4069--4081, 1991.
[12]
A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In UAI'00, pages 176--183, 2000.
[13]
J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL'05, pages 363--370, 2005.
[14]
G. Heinrich. Parameter estimation for text analysis. Technical report, University of Leipzig, Germany, 2004.
[15]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR'99, pages 50--57, 1999.
[16]
Y. Hu, J. Boyd-Graber, and B. Satinoff. Interactive topic modeling. In HLT'11, pages 248--257, 2011.
[17]
H. Huang, Z. Wen, D. Yu, H. Ji, Y. Sun, J. Han, and H. Li. Resolving entity morphs in censored data. In ACL'13, pages 1083--1093, 2013.
[18]
S. Kataria, K. S. Kumar, R. Rastogi, P. Sen, and S. H. Sengamedu. Entity disambiguation with hierarchical topic models. In KDD'11, pages 1037--1045, 2011.
[19]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML'01, pages 282--289, 2001.
[20]
C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. Twiner: Named entity recognition in targeted twitter stream. In SIGIR'12, pages 721--730, 2012.
[21]
J. Li, J. Tang, Y. Li, and Q. Luo. Rimom: A dynamic multi-strategy ontology alignment framework. IEEE TKDE, 21(8):1218--1232, 2009.
[22]
Y. Li, C. Wang, F. Han, J. Han, D. Roth, and X. Yan. Mining evidences for named entity disambiguation. In KDD'13, pages 1070--1078, 2013.
[23]
X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing named entities in tweets. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, ACL '11, pages 359--367, 2011.
[24]
A. K. McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
[25]
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Linguisticae Investigationes, 30:3--26, 2007.
[26]
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In EMNLP '09, pages 248--256, 2009.
[27]
A. Ritter, S. Clark, Mausam, and O. Etzioni. Named entity recognition in tweets: An experimental study. In EMNLP'11, pages 1524--1534, 2011.
[28]
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI'04, pages 487--494, 2004.
[29]
W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In KDD'13, pages 68--76, 2013.
[30]
M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD'04, pages 306--315, 2004.
[31]
Y.-C. Tam and T. Schultz. Correlated latent semantic model for unsupervised lm adaptation. In ICASSP'07, volume 4, pages IV--41--IV--44, 2007.
[32]
J. Tang, A. Fong, B. Wang, and J. Zhang. A unified probabilistic framework for name disambiguation in digital library. IEEE TKDE, 24(6):975--987, 2012.
[33]
J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD'09, pages 807--816, 2009.
[34]
J. Tang, S. Wu, J. Sun, and H. Su. Cross-domain collaboration recommendation. In KDD'12, pages 1285--1294, 2012.
[35]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990--998, 2008.
[36]
K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271--289, 1999.
[37]
O. Uzuner, Y. Juo, and P. Szolovits. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc, 14(5):550--563, 2007.
[38]
S. Wu, Z. Fang, and J. Tang. Accurate product name recognition from user generated content. In ICDM 2012 Contest, pages 874--877, 2012.

Cited By

View all
  • (2022)A distantly supervised approach for recognizing product mentions in user-generated contentJournal of Intelligent Information Systems10.1007/s10844-022-00718-459:3(543-566)Online publication date: 27-May-2022
  • (2020)A semantic and social‐based collaborative recommendation of friends in social networksSoftware: Practice and Experience10.1002/spe.282850:8(1498-1519)Online publication date: 3-Apr-2020
  • (2017)Learning to Align Comments to News TopicsACM Transactions on Information Systems10.1145/307259136:1(1-31)Online publication date: 17-Jul-2017
  • Show More Cited By

Index Terms

  1. Incorporating Social Context and Domain Knowledge for Entity Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1460 pages
    ISBN:9781450334693

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. instance recognition
    2. probabilistic model
    3. social network

    Qualifiers

    • Research-article

    Funding Sources

    • National Hightech R&D Program
    • Natural Science Foundation of China

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A distantly supervised approach for recognizing product mentions in user-generated contentJournal of Intelligent Information Systems10.1007/s10844-022-00718-459:3(543-566)Online publication date: 27-May-2022
    • (2020)A semantic and social‐based collaborative recommendation of friends in social networksSoftware: Practice and Experience10.1002/spe.282850:8(1498-1519)Online publication date: 3-Apr-2020
    • (2017)Learning to Align Comments to News TopicsACM Transactions on Information Systems10.1145/307259136:1(1-31)Online publication date: 17-Jul-2017
    • (2017)TempoRecMobile Networks and Applications10.1007/s11036-017-0864-322:6(1182-1191)Online publication date: 1-Dec-2017
    • (2016)Multi-modal Bayesian embeddings for learning social knowledge graphsProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3060941(2287-2293)Online publication date: 9-Jul-2016
    • (2016)Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised LearningProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983860(1029-1038)Online publication date: 24-Oct-2016
    • (2016)AMinerProceedings of the 25th International Conference Companion on World Wide Web10.1145/2872518.2890513(373-373)Online publication date: 11-Apr-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media