skip to main content
10.1145/2505515.2505680acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Penguins in sweaters, or serendipitous entity search on user-generated content

Published: 27 October 2013 Publication History

Abstract

In many cases, when browsing the Web users are searching for specific information or answers to concrete questions. Sometimes, though, users find unexpected, yet interesting and useful results, and are encouraged to explore further. What makes a result serendipitous? We propose to answer this question by exploring the potential of entities extracted from two sources of user-generated content -- Wikipedia, a user-curated online encyclopedia, and Yahoo! Answers, a more unconstrained question/answering forum -- in promoting serendipitous search. In this work, the content of each data source is represented as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category. We devise an algorithm based on lazy random walk with restart to retrieve entity recommendations from the networks. We show that our method provides novel results from both datasets, compared to standard web search engines. However, unlike previous research, we find that choosing highly emotional entities does not increase user interest for many categories of entities, suggesting a more complex relationship between topic matter and the desirable metadata attributes in serendipitous search.

Supplemental Material

ZIP File
The archive includes all the source files and images used in the paper.

References

[1]
E. Amitay, D. Carmel, N. Har'El, S. Ofek-Koifman, A. Soffer, S. Yogev, and N. Golbandi. Social search and discovery using a unified approach. In HT, 2009.
[2]
P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. ACM SIGCHI, 2009.
[3]
J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. In Advances in information retrieval, pages 141--152. Springer, 2011.
[4]
K. Balog, E. Meij, and M. de Rijke. Entity search: building bridges between two worlds. In SEMSEARCH, 2010.
[5]
R. Baraglia, G. De Francisci Morales, and C. Lucchese. Document similarity self-join with mapreduce. In ICDM, 2010.
[6]
P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query-ow graphs. In Proceedings of the 2009 workshop on Web Search Click Data, WSCD, 2009.
[7]
F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Efficient query recommendations in the long tail via center-piece subgraphs. In SIGIR, 2012.
[8]
I. Bordino, G. De Francisci Morales, I. Weber, and F. Bonchi. From machu picchu to rafting the urubamba river: Anticipating information needs via the entity-query graph. In WSDM, 2013.
[9]
A. Bozzon, M. Brambilla, S. Ceri, and P. Fraternali. Liquid query: multi-domain exploratory search on the web. In WWW, 2010.
[10]
K. Chakrabarti, V. Ganti, J. Han, and D. Xin. Ranking objects based on relationships. In SIGMOD, 2006.
[11]
T. Cheng, X. Yan, and K. C.-C. Chang. Entityrank: searching entities directly and holistically. In VLDB, 2007.
[12]
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR, 2007.
[13]
R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee. Comparing and aggregating rankings with ties. In PODS, 2004.
[14]
R. Flesch. A new readability yardstick. Journal of Applied Psychology, 32(3):p221 -- 233, June 1948.
[15]
M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. In RecSys '10, pages 257--260, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-906-0.
[16]
C. Hau and G.-J. Houben. Serendipitous browsing: Stumbling through wikipedia. Searching 4 Fun, 2012.
[17]
J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In EMNLP, 2011.
[18]
L. Iaquinta, M. De Gemmis, P. Lops, G. Semeraro, M. Filannino, and P. Molino. Introducing serendipity in a content-based recommender system. In HIS'08, pages 168--173, 2008.
[19]
G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, 2002.
[20]
G. Jeh and J. Widom. Scaling personalized web search. In WWW, 2003.
[21]
J. Karlgren, M. Sahlgren, F. Olsson, F. Espinoza, and O. Hamfors. Usefulness of sentiment analysis. In ECIR. 2012.
[22]
J. Y. Kim, K. Collins-Thompson, P. N. Bennett, and S. T. Dumais. Characterizing web content, user interests, and search behavior by reading level and topic. In WSDM, 2012.
[23]
H. Knäusl. Searching wikipedia: learning the why, the how, and the role played by emotion. Searching 4 Fun, 2012.
[24]
Y. Koren, S. C. North, and C. Volinsky. Measuring and extracting proximity in networks. In KDD, 2006.
[25]
O. Kucuktunc, B. Cambazoglu, I. Weber, and H. Ferhatosmanoglu. A large-scale sentiment analysis for yahoo! answers. In WSDM, 2012.
[26]
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In KDD, 2009.
[27]
Y. Liu and E. Agichtein. On the evolution of the yahoo! answers qa community. In SIGIR, 2008.
[28]
C. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval. CUP.
[29]
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM, 2007.
[30]
D. Milne and I. H. Witten. Learning to link with Wikipedia. In CIKM, 2008.
[31]
Y. Moshfeghi, M. Matthews, R. Blanco, and J. M. Jose. Inuence of timeline and named-entity components on user engagement. In ECIR, 2013.
[32]
H. O'Brien. Exploring user engagement in online news interactions. ASIST, 48(1):1--10, 2011.
[33]
B. O'Connor, M. Krieger, and D. Ahn. Tweetmotif: Exploratory search and topic summarization for twitter. ICWSM, 2010.
[34]
D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. In CIKM, 2009.
[35]
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11), 1975.
[36]
F. Sebastiani. Text categorization. In Encyclopedia of Database Technologies and Applications. 2005.
[37]
D. Shahaf, C. Guestrin, and E. Horvitz. Trains of thought: Generating information maps. In WWW, 2012.
[38]
M. Thelwall, K. Buckley, and G. Paltoglou. Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Technol., 63 (1):163--173, Jan. 2012. ISSN 1532-2882.
[39]
E. G. Toms. Serendipitous information retrieval. In DELOS, 2000.
[40]
H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In KDD'06, pages 404--413. ACM, 2006.
[41]
S. Yogev, H. Roitman, D. Carmel, and N. Zwerdling. Towards expressive exploratory search over entity-relationship data. In WWW Companion, 2012.
[42]
Y. Zhang, D. Séaghdha, D. Quercia, and T. Jambor. Auralist: introducing serendipity into music recommendation. In WSDM, 2012.
[43]
Y. Zhou, L. Nie, O. Rouhani-Kalleh, F. Vasile, and S. Ganey. Resolving surface forms to Wikipedia topics. In COLING, 2010.

Cited By

View all
  • (2022)Toward Tweet Entity Linking With Heterogeneous Information NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306809334:12(6003-6017)Online publication date: 1-Dec-2022
  • (2022)Researching Serendipity in Digital Information EnvironmentsundefinedOnline publication date: 10-Mar-2022
  • (2021)A Review of Graph-Based Models for Entity-Oriented SearchSN Computer Science10.1007/s42979-021-00828-w2:6Online publication date: 30-Aug-2021
  • Show More Cited By

Index Terms

  1. Penguins in sweaters, or serendipitous entity search on user-generated content

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
    October 2013
    2612 pages
    ISBN:9781450322638
    DOI:10.1145/2505515
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity networks
    2. entity search
    3. interestingness
    4. metadata
    5. serendipity

    Qualifiers

    • Research-article

    Conference

    CIKM'13
    Sponsor:
    CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
    October 27 - November 1, 2013
    California, San Francisco, USA

    Acceptance Rates

    CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Toward Tweet Entity Linking With Heterogeneous Information NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.306809334:12(6003-6017)Online publication date: 1-Dec-2022
    • (2022)Researching Serendipity in Digital Information EnvironmentsundefinedOnline publication date: 10-Mar-2022
    • (2021)A Review of Graph-Based Models for Entity-Oriented SearchSN Computer Science10.1007/s42979-021-00828-w2:6Online publication date: 30-Aug-2021
    • (2020) Information Encountering at the Opera: What Donizetti and Romani’s Opera Buffa L’elisir d’amore Can Teach Us About Pseudo-Serendipity in Human Information Behaviour Libri10.1515/libri-2018-010570:3(181-195)Online publication date: 4-Aug-2020
    • (2020)Graph-Query Suggestions for Knowledge Graph ExplorationProceedings of The Web Conference 202010.1145/3366423.3380005(2549-2555)Online publication date: 20-Apr-2020
    • (2019)Enriching News Articles with Related Search QueriesThe World Wide Web Conference10.1145/3308558.3313588(162-172)Online publication date: 13-May-2019
    • (2019)Entity Linking Based on Graph Model and Semantic RepresentationKnowledge Science, Engineering and Management10.1007/978-3-030-29551-6_50(561-571)Online publication date: 28-Aug-2019
    • (2018)Learning to Recommend Related Entities With Serendipity for Web Search UsersACM Transactions on Asian and Low-Resource Language Information Processing10.1145/318566317:3(1-22)Online publication date: 23-Apr-2018
    • (2018)Related Entity Finding on Highly-heterogeneous Knowledge Graphs2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)10.1109/ASONAM.2018.8508650(330-334)Online publication date: Aug-2018
    • (2017)The TagRec Framework as a Toolkit for the Development of Tag-Based Recommender SystemsAdjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization10.1145/3099023.3099069(23-28)Online publication date: 9-Jul-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media