research-article

Fun Facts: Automatic Trivia Fact Extraction from Wikipedia

Authors:

Dafna ShahafAuthors Info & Claims

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 345 - 354

https://doi.org/10.1145/3018661.3018709

Published: 02 February 2017 Publication History

Abstract

A significant portion of web search queries directly refers to named entities. Search engines explore various ways to improve the user experience for such queries. We suggest augmenting search results with trivia facts about the searched entity. Trivia is widely played throughout the world, and was shown to increase users' engagement and retention.

Most random facts are not suitable for the trivia section. There is skill (and art) to curating good trivia. In this paper, we formalize a notion of trivia-worthiness and propose an algorithm that automatically mines trivia facts from Wikipedia. We take advantage of Wikipedia's category structure, and rank an entity's categories by their trivia-quality. Our algorithm is capable of finding interesting facts, such as Obama's Grammy or Elvis' stint as a tank gunner. In user studies, our algorithm captures the intuitive notion of "good trivia" 45% higher than prior work. Search-page tests show a 22% decrease in bounce rates and a 12% increase in dwell time, proving our facts hold users' attention.

References

[1]

Ken Jennings. Brainiac: adventures in the curious, competitive, compulsive world of trivia buffs. Villard Books, 2007.

[2]

Paul André, Jaime Teevan, and Susan T. Dumais. From x-rays to Silly Putty via Uranus: Serendipity and its role in web search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '09, pages 2033--2036, New York, NY, USA, 2009. ACM.

Digital Library

[3]

Amanda Spink, Howard Greisdorf, and Judy Bateman. From highly relevant to not relevant: examining different regions of relevance. Information Processing & Management, 34(5):599--621, 1998.

Digital Library

[4]

Peter Mika. Entity search on the web. In Proc. WWW Companion, pages 1231--1232, 2013.

Digital Library

[5]

Xiaoxin Yin and Sarthak Shah. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 1001--1010, New York, NY, USA, 2010. ACM.

Digital Library

[6]

Horatiu Bota, Ke Zhou, and Joemon M. Jose. Playing your cards right: The effect of entity cards on search behaviour and workload. In Proc. CHIIR, pages 131--140, 2016.

Digital Library

[7]

Iris Miliaraki, Roi Blanco, and Mounia Lalmas. From "Selena Gomez" to "Marlon Brando": Understanding explorative entity search. In 24th International World Wide Web Conference (WWW 2015), Florence, Italy, May 2015.

Digital Library

[8]

Using trivia and quiz products to engage your customer, http://www.slideshare.net/woverstreet/using-trivia-and-quiz-products-to-engage-your-customer. [Online; accessed 17-July-2016].

[9]

This 25-year-old makes $500,000 a year tweeting random facts, http://www.cnbc.com/2016/07/16/25-year-old-kris-sanchez-makes-500000-a-year-from-uberfacts.html. [Online; accessed 17-July-2016].

[10]

Abhay Prakash, Manoj K. Chinnakotla, Dhaval Patel, and Puneet Garg. Did you know?: Mining interesting trivia for entities from wikipedia. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, pages 3164--3170. AAAI Press, 2015.

[11]

Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, and Xuan Zhou. Extracting semantics relationships between Wikipedia categories. SemWiki, 206, 2006.

[12]

Vivi Nastase and Michael Strube. Decoding Wikipedia categories for knowledge acquisition. In AAAI, volume 8, pages 1219--1224, 2008.

[13]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

[14]

Marco Baroni, Georgiana Dinu, and Germán Kruszewski. Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of Association for Computational Linguistics (ACL), volume 1, 2014.

[15]

Tom Kenter and Maarten de Rijke. Short text similarity with word embeddings. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM '15, pages 1411--1420, New York, NY, USA, 2015. ACM.

Digital Library

[16]

MediaWiki. Manual:Pywikibot -- Mediawiki, The Free Wiki Engine, https://www.mediawiki.org/w/index.php?title=Manual:Pywikibot&oldid=2176177, [Online; accessed 17-July-2016].

[17]

joksnet. Wiki2Plain, http://stackoverflow.com/a/4461624. [Online; accessed 17-July-2016].

[18]

Wikipedia. User:West.andrew.g/Popular pages -- Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=User:West.andrew.g/Popular_pages&oldid=730185650. [Online; accessed 17-July-2016].

[19]

Andrew E Goodman. Winning Results with Google AdWords. McGraw-Hill/Osborne, 2005.

[20]

D Sculley, Robert G Malkin, Sugato Basu, and Roberto J Bayardo. Predicting bounce rates in sponsored search advertisements. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1325--1334. ACM, 2009.

Digital Library

[21]

Display advertising clickthrough rates, http://www.smartinsights.com/internet-advertising/internet-advertising-analytics/display-advertising-clickthrough-rates/. [Online; accessed 17-July-2016].

[22]

Mike Hudak. Abraham Lincoln: vegetarian and animal rights advocate? - a review of the evidence. Broome County History Bulletin (Fall 2009, vol. 8, no. 2), 2009.

[23]

Ido Guy. Searching by talking: Analysis of voice queries on mobile web search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '16, pages 35--44, New York, NY, USA, 2016. ACM.

Digital Library

[24]

Yelena Mejova, Javier Borge-Holthoefer, and Ingmar Weber. Bridges into the unknown: Personalizing connections to little-known countries. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pages 2633--2642, New York, NY, USA, 2015. ACM.

Digital Library

[25]

Michael Gamon, Arjun Mukherjee, and Patrick Pantel. Predicting interesting things in text. In COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, August 23--29, 2014, Dublin, Ireland, pages 1477--1488, 2014.

[26]

Matthew Merzbacher. Automatic generation of trivia questions. In International Symposium on Methodologies for Intelligent Systems, pages 123--130. Springer, 2002.

[27]

Iulian Vlad Serban, Alberto García-Durán, Çaglar Gülçehre, Sungjin Ahn, Sarath Chandar, Aaron C. Courville, and Yoshua Bengio. Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7--12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics, 2016.

[28]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):15, 2009.

[29]

Emma Byrne and Anthony Hunter. Man bites dog: looking for interesting inconsistencies in structured news reports. Data & Knowledge Engineering, 48(3):265--295, 2004.

Digital Library

[30]

Ken McGarry. A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev., 20(1):39--61, March 2005.

Digital Library

[31]

James Malone, Kenneth McGarry, and Chris Bowerman. Performing trend analysis on spatio-temporal proteomics data using differential ratio data mining. In Proceedings of the 6th EPSRC Conference on Postgraduate Research in Electronics, Photonics, Communications and Software (PREP 2004), pages 103--105, 2004.

[32]

Sean M. McNee, John Riedl, and Joseph A. Konstan. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In Proc. CHI EA, pages 1097--1101, 2006.

Digital Library

[33]

Mouzhi Ge, Carla Delgado-Battenfeld, and Dietmar Jannach. Beyond accuracy: Evaluating recommender systems by coverage and serendipity. In Proc. RecSys, pages 257--260, 2010.

Digital Library

[34]

Christian Desrosiers and George Karypis. A comprehensive survey of neighborhood-based recommendation methods. In Recommender Systems Handbook, pages 107--144. Springer, 2011.

[35]

Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5--53, January 2004.

Digital Library

[36]

Tao Sun, Ming Zhang, and Qiaozhu Mei. Unexpected relevance: An empirical study of serendipity in retweets. In Emre Kiciman, Nicole B. Ellison, Bernie Hogan, Paul Resnick, and Ian Soboroff, editors, Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM 2013, Cambridge, Massachusetts, USA, July 8--11, 2013. The AAAI Press, 2013.

[37]

Leo Iaquinta, Marco De Gemmis, Pasquale Lops, Giovanni Semeraro, Michele Filannino, and Piero Molino. Introducing serendipity in a content-based recommender system. In Proc. HIS, pages 168--173. IEEE, 2008.

Digital Library

[38]

Kensuke Onuma, Hanghang Tong, and Christos Faloutsos. Tangent: A novel, 'surprise me', recommendation algorithm. In Proc. KDD, pages 657--666, 2009.

Digital Library

[39]

Ido Guy, Roy Levin, Tal Daniel, and Ella Bolshinsky. Islands in the stream: A study of item recommendation within an enterprise social stream. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '15, pages 665--674, New York, NY, USA, 2015. ACM.

Digital Library

Cited By

Tiwary NMohd Noah SFauzi FYee T(2024)Max Explainability Score–A quantitative metric for explainability evaluation in knowledge graph-based recommendationsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109190116:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.compeleceng.2024.109190
Boland KFafalios PTchechmedjiev ADietze STodorov K(2022)Beyond facts – a survey and conceptualisation of claims in online discourse analysisSemantic Web10.3233/SW-21283813:5(793-827)Online publication date: 18-Aug-2022
https://doi.org/10.3233/SW-212838
Wang PYu MLiu Y(2022)Assessing the Content Topics of the Educational Videos on Tik Tok for Science CommunicationProceedings of the 2022 6th International Seminar on Education, Management and Social Sciences (ISEMSS 2022)10.2991/978-2-494069-31-2_210(1792-1801)Online publication date: 29-Dec-2022
https://doi.org/10.2991/978-2-494069-31-2_210
Show More Cited By

Index Terms

Fun Facts: Automatic Trivia Fact Extraction from Wikipedia
1. Information systems
  1. Information systems applications
    1. Collaborative and social computing systems and tools
      1. Wikis
    2. Data mining
  2. World Wide Web
    1. Web searching and information discovery

Recommendations

Seriously fun: exploring how to combine promoting health awareness and engaging gameplay
MindTrek '08: Proceedings of the 12th international conference on Entertainment and media in the ubiquitous era

Combining engaging gameplay and educational aspects promoting health awareness gives an interesting challenge to game designers. This case study explores adolescents' (aged 13--16) technology usage, gaming habits and gaming motivations, as well as the ...
1000 Facts about Video Games Vol. 2
Science for fun: new impartial board games

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

February 2017

868 pages

ISBN:9781450346757

DOI:10.1145/3018661

General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM 2017

Sponsor:

WSDM 2017: Tenth ACM International Conference on Web Search and Data Mining

February 6 - 10, 2017

Cambridge, United Kingdom

Acceptance Rates

WSDM '17 Paper Acceptance Rate 80 of 505 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
453
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tiwary NMohd Noah SFauzi FYee T(2024)Max Explainability Score–A quantitative metric for explainability evaluation in knowledge graph-based recommendationsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109190116:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.compeleceng.2024.109190
Boland KFafalios PTchechmedjiev ADietze STodorov K(2022)Beyond facts – a survey and conceptualisation of claims in online discourse analysisSemantic Web10.3233/SW-21283813:5(793-827)Online publication date: 18-Aug-2022
https://doi.org/10.3233/SW-212838
Wang PYu MLiu Y(2022)Assessing the Content Topics of the Educational Videos on Tik Tok for Science CommunicationProceedings of the 2022 6th International Seminar on Education, Management and Social Sciences (ISEMSS 2022)10.2991/978-2-494069-31-2_210(1792-1801)Online publication date: 29-Dec-2022
https://doi.org/10.2991/978-2-494069-31-2_210
Jatowt AHung IFärber MCampos RYoshikawa M(2021)Exploding TV Sets and Disappointing Laptops: Suggesting Interesting Content in News Archives Based on Surprise EstimationAdvances in Information Retrieval10.1007/978-3-030-72113-8_17(254-269)Online publication date: 27-Mar-2021
https://doi.org/10.1007/978-3-030-72113-8_17
Miller SEl-Bahrawy ADittus MGraham MWright J(2020)Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.Proceedings of The Web Conference 202010.1145/3366423.3380022(2669-2675)Online publication date: 20-Apr-2020
https://dl.acm.org/doi/10.1145/3366423.3380022
Wang XJiang M(2020)Precise temporal slot filling via truth finding with data-driven commonsenseKnowledge and Information Systems10.1007/s10115-020-01493-wOnline publication date: 16-Jul-2020
https://doi.org/10.1007/s10115-020-01493-w
Wang XZhang HLi QShi YJiang M(2019)A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal ContextsThe World Wide Web Conference10.1145/3308558.3313435(3328-3334)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313435
Korn FWang XWu YYu CBoncz PManegold SAilamaki ADeshpande AKraska T(2019)Automatically Generating Interesting Facts from Wikipedia TablesProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3314043(349-361)Online publication date: 25-Jun-2019
https://dl.acm.org/doi/10.1145/3299869.3314043
Pasca MWolfe TCulpepper JMoffat ABennett PLerman K(2019)Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291020(78-86)Online publication date: 30-Jan-2019
https://dl.acm.org/doi/10.1145/3289600.3291020
Pasca MChampin PGandon FMédini LLalmas MIpeirotis P(2018)Finding Needles in an Encyclopedic HaystackProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186025(1267-1276)Online publication date: 10-Apr-2018
https://dl.acm.org/doi/10.1145/3178876.3186025

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents