skip to main content
10.1145/2740908.2745396acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Synonym Discovery for Structured Entities on Heterogeneous Graphs

Published: 18 May 2015 Publication History

Abstract

With the increasing use of entities in serving people's daily information needs, recognizing synonyms---different ways people refer to the same entity---has become a crucial task for many entity-leveraging applications. Previous works often take a "literal" view of the entity, i.e., its string name. In this work, we propose adopting a "structured" view of each entity by considering not only its string name, but also other important structured attributes. Unlike existing query log-based methods, we delve deeper to explore sub-queries, and exploit tailed synonyms and tailed web pages for harvesting more synonyms. A general, heterogeneous graph-based data model which encodes our problem insights is designed by capturing three key concepts (synonym candidate, web page and keyword) and different types of interactions between them. We cast the synonym discovery problem into a graph-based ranking problem and demonstrate the existence of a closed-form optimal solution for outputting entity synonym scores. Experiments on several real-life domains demonstrate the effectiveness of our proposed method.

References

[1]
L. M. Aiello, D. Donato, U. Ozertem, and F. Menczer. Behavior-driven clustering of queries into topics. In CIKM, 2011.
[2]
M. Baroni and S. Bisi. Using cooccurrence statistics and the web to discover synonyms in a technical language. In LERC, 2004.
[3]
M. Bendersky, D. Metzler, and W. B. Croft. Effective query formulation with multiple information sources. In WSDM, pages 443--452, 2012.
[4]
O. Benjelloun, H. Garcia-Molina, D. Menestrina, Q. Su, S. E. Whang, and J. Widom. Swoosh: a generic approach to entity resolution. VLDB, 18(1):255--276, 2009.
[5]
K. Chakrabarti, S. Chaudhuri, T. Cheng, and D. Xin. A framework for robust discovery of entity synonyms. In SIGKDD, 2012.
[6]
S. Chaudhuri, V. Ganti, and D. Xin. Exploiting web search to generate synonyms for entities. In WWW, 2009.
[7]
T. Cheng, H. W. Lauw, and S. Paparizos. Entity synonyms for structured web search. TKDE, 24(10):1862--1875, 2011.
[8]
N. Craswell, B. Billerbeck, D. Fetterly, and M. Najork. Robust query rewriting using anchor data. In WSDM, 2013.
[9]
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, 2007.
[10]
M. Danilevsky, C. Wang, N. Desai, X. Ren, J. Guo, and J. Han. Kert: Automatic extraction and ranking of topical keyphrases from content-representative document titles. SDM, 2014.
[11]
A. El-Kishky, Y. Song, C. Wang, C. R. Voss, and J. Han. Scalable topical phrase mining from text corpora. VLDB, 2015.
[12]
A. Gattani, D. S. Lamba, N. Garera, M. Tiwari, X. Chai, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach. VLDB, 6(11):1126--1137, 2013.
[13]
L. Getoor and A. Machanavajjhala. Entity resolution: theory, practice & open challenges. VLDB, 5(12):2018--2019, 2012.
[14]
G. H. Golub and C. F. Van Loan. Matrix computations, volume 3. JHU Press, 2012.
[15]
J. Guo, G. Xu, X. Cheng, and H. Li. Named entity recognition in query. In SIGIR, 2009.
[16]
G. Halawi, G. Dror, E. Gabrilovich, and Y. Koren. Large-scale learning of word relatedness with constraints. In SIGKDD, 2012.
[17]
Y. Hu, Y. Qian, H. Li, D. Jiang, J. Pei, and Q. Zheng. Mining query subtopics from search log data. In SIGIR, 2012.
[18]
M. Ji, Y. Sun, M. Danilevsky, J. Han, and J. Gao. Graph regularized transductive classification on heterogeneous information networks. In ECMLPKDD, 2010.
[19]
L. Jiang, P. Luo, J. Wang, Y. Xiong, B. Lin, M. Wang, and N. An. Grias: an entity-relation graph based framework for discovering entity aliases. In ICDM, 2013.
[20]
H. Kim, X. Ren, Y. Sun, C. Wang, and J. Han. Semantic frame-based document representation for comparable corpora. In ICDM, pages 350--359, 2013.
[21]
Y. Li, B.-J. P. Hsu, C. Zhai, and K. Wang. Mining entity attribute synonyms via compact clustering. In CIKM, 2013.
[22]
D. Lin, S. Zhao, L. Qin, and M. Zhou. Identifying synonyms among distributionally similar words. In IJCAI, 2003.
[23]
H. Ma, H. Yang, I. King, and M. R. Lyu. Learning latent semantic relations from clickthrough data for query suggestion. In CIKM, 2008.
[24]
Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In CIKM, 2008.
[25]
P. Pantel, E. Crestan, A. Borkovsky, A.-M. Popescu, and V. Vyas. Web-scale distributional similarity and entity set expansion. In EMNLP, 2009.
[26]
H. Poon and P. Domingos. Joint unsupervised coreference resolution with markov logic. In ACL, 2008.
[27]
X. Ren, Y. Wang, X. Yu, J. Yan, Z. Chen, and J. Han. Heterogeneous graph-based intent learning with queries, web pages and wikipedia concepts. In WSDM, pages 23--32, 2014.
[28]
E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering query refinements by user intent. In WWW, 2010.
[29]
W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE, (99):1--20, 2014.
[30]
P. D. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. In ECML. 2001.
[31]
C. Wang, K. Chakrabarti, T. Cheng, and S. Chaudhuri. Targeted disambiguation of ad-hoc, homogeneous sets of named entities. In WWW, 2012.
[32]
X. Wang, D. Chakrabarti, and K. Punera. Mining broad latent query aspects from search sessions. In SIGKDD, 2009.
[33]
X. Wei, F. Peng, H. Tseng, Y. Lu, and B. Dumoulin. Context sensitive synonym discovery for web search queries. In CIKM, 2009.
[34]
D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In NIPS, 2004.

Cited By

View all
  • (2023)Synonym recognition from short textsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119966224:COnline publication date: 15-Aug-2023
  • (2023)A bilateral context and filtering strategy-based approach to Chinese entity synonym set expansionComplex & Intelligent Systems10.1007/s40747-023-01064-w9:5(6065-6085)Online publication date: 25-Apr-2023
  • (2021)Set-aware Entity Synonym Discovery with Flexible Receptive FieldsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3087532(1-1)Online publication date: 2021
  • Show More Cited By

Index Terms

  1. Synonym Discovery for Structured Entities on Heterogeneous Graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1602 pages
    ISBN:9781450334730
    DOI:10.1145/2740908

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. heterogeneous graph
    2. structured entity
    3. synonym discovery

    Qualifiers

    • Research-article

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Synonym recognition from short textsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119966224:COnline publication date: 15-Aug-2023
    • (2023)A bilateral context and filtering strategy-based approach to Chinese entity synonym set expansionComplex & Intelligent Systems10.1007/s40747-023-01064-w9:5(6065-6085)Online publication date: 25-Apr-2023
    • (2021)Set-aware Entity Synonym Discovery with Flexible Receptive FieldsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3087532(1-1)Online publication date: 2021
    • (2020)Entity Synonym Discovery via Multiple AttentionsSemantic Technology10.1007/978-3-030-41407-8_18(271-286)Online publication date: 14-Feb-2020
    • (2019)Place Deduplication with EmbeddingsThe World Wide Web Conference10.1145/3308558.3313456(3420-3426)Online publication date: 13-May-2019
    • (2019)Entity Set Expansion for Detecting Fashion Trends2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)10.1109/ICMLA.2019.00033(162-167)Online publication date: Dec-2019
    • (2017)Automatic Synonym Discovery with Knowledge BasesProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098185(997-1005)Online publication date: 13-Aug-2017

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media