skip to main content
10.1145/3018661.3018709acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Fun Facts: Automatic Trivia Fact Extraction from Wikipedia

Published: 02 February 2017 Publication History

Abstract

A significant portion of web search queries directly refers to named entities. Search engines explore various ways to improve the user experience for such queries. We suggest augmenting search results with trivia facts about the searched entity. Trivia is widely played throughout the world, and was shown to increase users' engagement and retention.
Most random facts are not suitable for the trivia section. There is skill (and art) to curating good trivia. In this paper, we formalize a notion of trivia-worthiness and propose an algorithm that automatically mines trivia facts from Wikipedia. We take advantage of Wikipedia's category structure, and rank an entity's categories by their trivia-quality. Our algorithm is capable of finding interesting facts, such as Obama's Grammy or Elvis' stint as a tank gunner. In user studies, our algorithm captures the intuitive notion of "good trivia" 45% higher than prior work. Search-page tests show a 22% decrease in bounce rates and a 12% increase in dwell time, proving our facts hold users' attention.

References

[1]
Ken Jennings. Brainiac: adventures in the curious, competitive, compulsive world of trivia buffs. Villard Books, 2007.
[2]
Paul André, Jaime Teevan, and Susan T. Dumais. From x-rays to Silly Putty via Uranus: Serendipity and its role in web search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '09, pages 2033--2036, New York, NY, USA, 2009. ACM.
[3]
Amanda Spink, Howard Greisdorf, and Judy Bateman. From highly relevant to not relevant: examining different regions of relevance. Information Processing & Management, 34(5):599--621, 1998.
[4]
Peter Mika. Entity search on the web. In Proc. WWW Companion, pages 1231--1232, 2013.
[5]
Xiaoxin Yin and Sarthak Shah. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 1001--1010, New York, NY, USA, 2010. ACM.
[6]
Horatiu Bota, Ke Zhou, and Joemon M. Jose. Playing your cards right: The effect of entity cards on search behaviour and workload. In Proc. CHIIR, pages 131--140, 2016.
[7]
Iris Miliaraki, Roi Blanco, and Mounia Lalmas. From "Selena Gomez" to "Marlon Brando": Understanding explorative entity search. In 24th International World Wide Web Conference (WWW 2015), Florence, Italy, May 2015.
[8]
Using trivia and quiz products to engage your customer, http://www.slideshare.net/woverstreet/using-trivia-and-quiz-products-to-engage-your-customer. [Online; accessed 17-July-2016].
[9]
This 25-year-old makes $500,000 a year tweeting random facts, http://www.cnbc.com/2016/07/16/25-year-old-kris-sanchez-makes-500000-a-year-from-uberfacts.html. [Online; accessed 17-July-2016].
[10]
Abhay Prakash, Manoj K. Chinnakotla, Dhaval Patel, and Puneet Garg. Did you know?: Mining interesting trivia for entities from wikipedia. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, pages 3164--3170. AAAI Press, 2015.
[11]
Sergey Chernov, Tereza Iofciu, Wolfgang Nejdl, and Xuan Zhou. Extracting semantics relationships between Wikipedia categories. SemWiki, 206, 2006.
[12]
Vivi Nastase and Michael Strube. Decoding Wikipedia categories for knowledge acquisition. In AAAI, volume 8, pages 1219--1224, 2008.
[13]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[14]
Marco Baroni, Georgiana Dinu, and Germán Kruszewski. Don't count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of Association for Computational Linguistics (ACL), volume 1, 2014.
[15]
Tom Kenter and Maarten de Rijke. Short text similarity with word embeddings. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM '15, pages 1411--1420, New York, NY, USA, 2015. ACM.
[16]
MediaWiki. Manual:Pywikibot -- Mediawiki, The Free Wiki Engine, https://www.mediawiki.org/w/index.php?title=Manual:Pywikibot&oldid=2176177, [Online; accessed 17-July-2016].
[17]
joksnet. Wiki2Plain, http://stackoverflow.com/a/4461624. [Online; accessed 17-July-2016].
[18]
Wikipedia. User:West.andrew.g/Popular pages -- Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=User:West.andrew.g/Popular_pages&oldid=730185650. [Online; accessed 17-July-2016].
[19]
Andrew E Goodman. Winning Results with Google AdWords. McGraw-Hill/Osborne, 2005.
[20]
D Sculley, Robert G Malkin, Sugato Basu, and Roberto J Bayardo. Predicting bounce rates in sponsored search advertisements. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1325--1334. ACM, 2009.
[21]
Display advertising clickthrough rates, http://www.smartinsights.com/internet-advertising/internet-advertising-analytics/display-advertising-clickthrough-rates/. [Online; accessed 17-July-2016].
[22]
Mike Hudak. Abraham Lincoln: vegetarian and animal rights advocate? - a review of the evidence. Broome County History Bulletin (Fall 2009, vol. 8, no. 2), 2009.
[23]
Ido Guy. Searching by talking: Analysis of voice queries on mobile web search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '16, pages 35--44, New York, NY, USA, 2016. ACM.
[24]
Yelena Mejova, Javier Borge-Holthoefer, and Ingmar Weber. Bridges into the unknown: Personalizing connections to little-known countries. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, CHI '15, pages 2633--2642, New York, NY, USA, 2015. ACM.
[25]
Michael Gamon, Arjun Mukherjee, and Patrick Pantel. Predicting interesting things in text. In COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, August 23--29, 2014, Dublin, Ireland, pages 1477--1488, 2014.
[26]
Matthew Merzbacher. Automatic generation of trivia questions. In International Symposium on Methodologies for Intelligent Systems, pages 123--130. Springer, 2002.
[27]
Iulian Vlad Serban, Alberto García-Durán, Çaglar Gülçehre, Sungjin Ahn, Sarath Chandar, Aaron C. Courville, and Yoshua Bengio. Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7--12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics, 2016.
[28]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):15, 2009.
[29]
Emma Byrne and Anthony Hunter. Man bites dog: looking for interesting inconsistencies in structured news reports. Data & Knowledge Engineering, 48(3):265--295, 2004.
[30]
Ken McGarry. A survey of interestingness measures for knowledge discovery. Knowl. Eng. Rev., 20(1):39--61, March 2005.
[31]
James Malone, Kenneth McGarry, and Chris Bowerman. Performing trend analysis on spatio-temporal proteomics data using differential ratio data mining. In Proceedings of the 6th EPSRC Conference on Postgraduate Research in Electronics, Photonics, Communications and Software (PREP 2004), pages 103--105, 2004.
[32]
Sean M. McNee, John Riedl, and Joseph A. Konstan. Being accurate is not enough: How accuracy metrics have hurt recommender systems. In Proc. CHI EA, pages 1097--1101, 2006.
[33]
Mouzhi Ge, Carla Delgado-Battenfeld, and Dietmar Jannach. Beyond accuracy: Evaluating recommender systems by coverage and serendipity. In Proc. RecSys, pages 257--260, 2010.
[34]
Christian Desrosiers and George Karypis. A comprehensive survey of neighborhood-based recommendation methods. In Recommender Systems Handbook, pages 107--144. Springer, 2011.
[35]
Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5--53, January 2004.
[36]
Tao Sun, Ming Zhang, and Qiaozhu Mei. Unexpected relevance: An empirical study of serendipity in retweets. In Emre Kiciman, Nicole B. Ellison, Bernie Hogan, Paul Resnick, and Ian Soboroff, editors, Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM 2013, Cambridge, Massachusetts, USA, July 8--11, 2013. The AAAI Press, 2013.
[37]
Leo Iaquinta, Marco De Gemmis, Pasquale Lops, Giovanni Semeraro, Michele Filannino, and Piero Molino. Introducing serendipity in a content-based recommender system. In Proc. HIS, pages 168--173. IEEE, 2008.
[38]
Kensuke Onuma, Hanghang Tong, and Christos Faloutsos. Tangent: A novel, 'surprise me', recommendation algorithm. In Proc. KDD, pages 657--666, 2009.
[39]
Ido Guy, Roy Levin, Tal Daniel, and Ella Bolshinsky. Islands in the stream: A study of item recommendation within an enterprise social stream. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '15, pages 665--674, New York, NY, USA, 2015. ACM.

Cited By

View all
  • (2024)Max Explainability Score–A quantitative metric for explainability evaluation in knowledge graph-based recommendationsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109190116:COnline publication date: 1-May-2024
  • (2022)Beyond facts – a survey and conceptualisation of claims in online discourse analysisSemantic Web10.3233/SW-21283813:5(793-827)Online publication date: 18-Aug-2022
  • (2022)Assessing the Content Topics of the Educational Videos on Tik Tok for Science CommunicationProceedings of the 2022 6th International Seminar on Education, Management and Social Sciences (ISEMSS 2022)10.2991/978-2-494069-31-2_210(1792-1801)Online publication date: 29-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. factoids
  2. fun facts
  3. serendipity
  4. surprise
  5. trivia

Qualifiers

  • Research-article

Conference

WSDM 2017

Acceptance Rates

WSDM '17 Paper Acceptance Rate 80 of 505 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Max Explainability Score–A quantitative metric for explainability evaluation in knowledge graph-based recommendationsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109190116:COnline publication date: 1-May-2024
  • (2022)Beyond facts – a survey and conceptualisation of claims in online discourse analysisSemantic Web10.3233/SW-21283813:5(793-827)Online publication date: 18-Aug-2022
  • (2022)Assessing the Content Topics of the Educational Videos on Tik Tok for Science CommunicationProceedings of the 2022 6th International Seminar on Education, Management and Social Sciences (ISEMSS 2022)10.2991/978-2-494069-31-2_210(1792-1801)Online publication date: 29-Dec-2022
  • (2021)Exploding TV Sets and Disappointing Laptops: Suggesting Interesting Content in News Archives Based on Surprise EstimationAdvances in Information Retrieval10.1007/978-3-030-72113-8_17(254-269)Online publication date: 27-Mar-2021
  • (2020)Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets.Proceedings of The Web Conference 202010.1145/3366423.3380022(2669-2675)Online publication date: 20-Apr-2020
  • (2020)Precise temporal slot filling via truth finding with data-driven commonsenseKnowledge and Information Systems10.1007/s10115-020-01493-wOnline publication date: 16-Jul-2020
  • (2019)A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal ContextsThe World Wide Web Conference10.1145/3308558.3313435(3328-3334)Online publication date: 13-May-2019
  • (2019)Automatically Generating Interesting Facts from Wikipedia TablesProceedings of the 2019 International Conference on Management of Data10.1145/3299869.3314043(349-361)Online publication date: 25-Jun-2019
  • (2019)Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291020(78-86)Online publication date: 30-Jan-2019
  • (2018)Finding Needles in an Encyclopedic HaystackProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186025(1267-1276)Online publication date: 10-Apr-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media