skip to main content
research-article

"Driving curiosity in search with large-scale entity networks" by Ilaria Bordino, Mounia Lalmas, Yelena Mejova, and Olivier Van Laere with Martin Vesely as coordinator

Published: 08 December 2014 Publication History

Abstract

In many search scenarios, users sometimes find unexpected, yet interesting and useful results, which make them curious; they experience serendipity. This curiosity encourages them to explore further. We developed an entity search system designed to support such an experience.
The system explores the potential of entities extracted from two of the most popular sources of user-generated content -- Wikipedia, a user-curated online encyclopedia, and Yahoo Answers, a more unconstrained question & answering forum -- in promoting serendipitous search. The content of each data source is represented as a large network of entities, enriched with metadata about sentiment, writing quality, and topical category. A lazy random walk with restart is implemented to retrieve entities from the networks for a given entity query.
This paper discusses our work, focusing on our experience in designing, developing, and evaluating such a system. We also discuss the challenges in developing large-scale systems that aim to drive curiosity in search.

References

[1]
Aigner, W., Miksch, S., Müller, W., Schumann, H., and Tominski, C. 2007. Visualizing time-oriented dataa systematic view. Computers & Graphics 31, 3, 401--409.
[2]
Andre, P., Teevan, J., and Dumais, S. T. 2009. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI.
[3]
Arguello, J., Diaz, F., Callan, J., and Carterette, B. 2011. A methodology for evaluating aggregated search results. In ECIR.
[4]
Baeza-Yates, R. 2010. Searching the web of objects. In Proceedings of the Third International Conference on Objects and Databases. ICOODB'10. Springer-Verlag, Berlin, Heidelberg, 6--7.
[5]
Balog, K., Meij, E., and de Rijke, M. 2010. Entity search: Building bridges between two worlds. In Proceedings of the 3rd International Semantic Search Workshop. SEMSEARCH '10. ACM, New York, NY, USA, 9:1--9:5.
[6]
Baraglia, R., De Francisci Morales, G., and Lucchese, C. 2010. Document similarity self-join with mapreduce. In ICDM.
[7]
Bordino, I., Mejova, Y., and Lalmas, M. 2013. Penguins in sweaters, or serendipitous entity search on user-generated content. In Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management. CIKM '13. ACM, New York, NY, USA, 109--118.
[8]
Carterette, B. and Chandar, P. 2009. Probabilistic models of ranking novel documents for faceted topic retrieval. In Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 1287--1296.
[9]
Cheng, T., Yan, X., and Chang, K. C.-C. 2007. Supporting entity search: A large-scale prototype search engine. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. SIGMOD '07. ACM, New York, NY, USA, 1144--1146.
[10]
Crutcher, M. and Zook, M. 2009. Placemarks and waterlines: Racialized cyberscapes in post-katrina google earth. Geoforum 40, 4, 523--534.
[11]
Dakka, W., Dayal, R., and Ipeirotis, P. 2006. Automatic discovery of useful facet terms. In SIGIR Faceted Search Workshop. 18--22.
[12]
English, J., Hearst, M., Sinha, R., Swearingen, K., and Lee, K. 2002. Flexible search and navigation using faceted metadata. Tech. rep., Technical report, University of Berkeley, School of Information Management and Systems, 2003. Submitted for publication.
[13]
Flesch, R. 1948. A new readability yardstick. Journal of Applied Psychology 32, 3 (June), p221--233.
[14]
Fujimura, K., Toda, H., Inoue, T., Hiroshima, N., Kataoka, R., and Sugizaki, M. 2006. Blograngera multi-faceted blog search engine. In International World Wide Web Conference, Proc. 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics.
[15]
Ge, M., Delgado-Battenfeld, C., and Jannach, D. 2010. Beyond accuracy: evaluating recommender systems by coverage and serendipity. In Proceedings of the fourth ACM conference on Recommender systems. ACM, New York, NY, USA, 257--260.
[16]
Grassi, M., Cambria, E., Hussain, A., and Piazza, F. 2011. Sentic web: A new paradigm for managing social media affective information. Cognitive Computation 3, 3, 480--489.
[17]
Hauff, C. and Houben, G.-J. 2012. Serendipitous browsing: Stumbling through wikipedia. Searching 4 Fun.
[18]
Jeh, G. and Widom, J. 2003. Scaling personalized web search. In WWW.
[19]
Kamvar, S. D. and Harris, J. 2011. We feel fine and searching the emotional web. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 117--126.
[20]
Kim, J. Y., Collins-Thompson, K., Bennett, P. N., and Dumais, S. T. 2012. Characterizing web content, user interests, and search behavior by reading level and topic. In WSDM.
[21]
Kucuktunc, O., Cambazoglu, B. B., Weber, I., and Ferhatosmanoglu, H. 2012. A large-scale sentiment analysis for yahoo! answers. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. WSDM '12. ACM, New York, NY, USA, 633--642.
[22]
Kules, B., Capra, R., Banta, M., and Sierra, T. 2009. What do exploratory searchers look at in a faceted search interface? In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries. ACM, 313--322.
[23]
Mendez-Diaz, I., Zabala, P., Bonchi, F., Castillo, C., Feuerstein, E., and Amer-Yahia, S. 2014. Composite retrieval of diverse and complementary bundles. IEEE Transactions on Knowledge and Data Engineering 99, PrePrints, 1.
[24]
Miller, C. C. 2006. A beast in the field: The google maps mashup as gis/2. Cartographica: The International Journal for Geographic Information and Geovisualization 41, 3, 187--199.
[25]
Paranjpe, D. 2009. Learning document aboutness from implicit user feedback and document structure. In Proceedings of the 18th ACM Conference on Information and Knowledge Management. CIKM '09. ACM, New York, NY, USA, 365--374.
[26]
Tekusova, T. and Schreck, T. 2008. Visualizing time-dependent data in multivariate hierarchic plots-design and evaluation of an economic application. In Information Visualisation, 2008. IV'08. 12th International Conference. IEEE, 143--150.
[27]
Tong, H. and Faloutsos, C. 2006. Center-piece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD '06. ACM, New York, NY, USA, 404--413.
[28]
Tunkelang, D. 2006. Dynamic category sets: An approach for faceted search. In ACM SIGIR. Vol. 6.
[29]
Walther, M. and Kaisser, M. 2013. Geo-spatial event detection in the twitter stream. In Advances in Information Retrieval. Springer, 356--367.
[30]
Wilde, E. 2006. Knowledge organization mashups. Retrieved 8, 2007.
[31]
Zhou, Y., Nie, L., Rouhani-Kalleh, O., Vasile, F., and Gaffney, S. 2010. Resolving surface forms to wikipedia topics. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, 1335--1343.

Cited By

View all
  • (2022)Researching Serendipity in Digital Information EnvironmentsundefinedOnline publication date: 10-Mar-2022
  • (2017)Researching Serendipity in Digital Information EnvironmentsSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00790ED1V01Y201707ICR0599:6(i-91)Online publication date: 28-Sep-2017
  • (2016)Beyond entities: promoting explorative search with bundlesInformation Retrieval Journal10.1007/s10791-016-9283-519:5(447-486)Online publication date: 13-Jul-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGWEB Newsletter
ACM SIGWEB Newsletter  Volume 2014, Issue Autumn
Autumn 2014
41 pages
ISSN:1931-1745
EISSN:1931-1435
DOI:10.1145/2682914
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2014
Published in SIGWEB Volume 2014, Issue Autumn

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Linguistically Motivated Semantic Aggregation Engines (LiMoSINe13) EU project

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Researching Serendipity in Digital Information EnvironmentsundefinedOnline publication date: 10-Mar-2022
  • (2017)Researching Serendipity in Digital Information EnvironmentsSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00790ED1V01Y201707ICR0599:6(i-91)Online publication date: 28-Sep-2017
  • (2016)Beyond entities: promoting explorative search with bundlesInformation Retrieval Journal10.1007/s10791-016-9283-519:5(447-486)Online publication date: 13-Jul-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media