demonstration

The Tag Genome Dataset for Books

Authors:
Denis Kotkov

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

,
Alan Medlar

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

,
Alexandr Maslov

Åbo Akademi University, Finland

Åbo Akademi University, Finland
View Profile

,
Umesh Raj Satyal

Åbo Akademi University, Finland

Åbo Akademi University, Finland
View Profile

,
Mats Neovius

Åbo Akademi University, Finland

Åbo Akademi University, Finland
View Profile

,
Dorota Glowacka

University of Helsinki, Finland

University of Helsinki, Finland
View Profile

CHIIR '22: Proceedings of the 2022 Conference on Human Information Interaction and RetrievalMarch 2022Pages 353–357https://doi.org/10.1145/3498366.3505833

Published:14 March 2022Publication History

CHIIR '22: Proceedings of the 2022 Conference on Human Information Interaction and Retrieval

Pages 353–357

ABSTRACT

Attaching tags to items, such as books or movies, is found in many online systems. While a majority of these systems use binary tags, continuous item-tag relevance scores, such as those in tag genome, offer richer descriptions of item content. For example, tag genome for movies assigns the tag “gangster” to the movie “The Godfather (1972)” with a score of 0.93 on a scale of 0 to 1. Tag genome has received considerable attention in recommender systems research and has been used in a wide variety of studies, from investigating the effects of recommender systems on users to generating ideas for movies that appeal to certain user groups.

In this paper, we present tag genome for books, a dataset containing book-tag relevance scores, where a significant number of tags overlap with those from tag genome for movies. To generate our dataset, we designed a survey based on popular books and tags from the Goodreads dataset. In our survey, we asked users to provide ratings for how well tags applied to books. We generated book-tag relevance scores based on user ratings along with features from the Goodreads dataset. In addition to being used to create book recommender systems, tag genome for books can be combined with the tag genome for movies to tackle cross-domain problems, such as recommending books based on movie preferences.

References

[n.d.]. Amazon Mechanical Turk. mturk.com. [Online; accessed 09-June-2021].Google Scholar
[n.d.]. Goodreads | Meet your next favorite book. https://www.goodreads.com/. [Online; accessed 09-June-2021].Google Scholar
[n.d.]. Instagram. https://instagram.com/. [Online; accessed 09-June-2021].Google Scholar
[n.d.]. Internet Movie Database. https://imdb.com/. [Online; accessed 09-June-2021].Google Scholar
[n.d.]. MovieLens. Non-commercial, personalized movie recommendations.https://movielens.org/. [Online; accessed 09-June-2021].Google Scholar
Konstantinos Bougiatiotis and Theodoros Giannakopoulos. 2016. Content representation and similarity of movies based on topic extraction from subtitles. In Proceedings of the 9th Hellenic Conference on Artificial Intelligence. 1–7.Google ScholarDigital Library
Konstantinos Bougiatiotis and Theodoros Giannakopoulos. 2018. Enhanced movie content similarity based on textual, auditory and visual information. Expert Systems with Applications 96 (2018), 86–102.Google ScholarCross Ref
Iván Cantador, Ignacio Fernández-Tobías, Shlomo Berkovsky, and Paolo Cremonesi. 2015. Cross-domain recommender systems. In Recommender systems handbook. Springer, 919–959.Google ScholarDigital Library
Shuo Chang, F Maxwell Harper, and Loren Terveen. 2015. Using groups of items for preference elicitation in recommender systems. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 1258–1269.Google ScholarDigital Library
Shuo Chang, F Maxwell Harper, and Loren Gilbert Terveen. 2016. Crowd-based personalized natural language explanations for recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. 175–182.Google ScholarDigital Library
Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science 41, 6(1990), 391–407.Google ScholarCross Ref
Joaquin Derrac and Steven Schockaert. 2015. Inducing semantic relations from conceptual spaces: a data-driven approach to plausible reasoning. Artificial Intelligence 228 (2015), 66–94.Google ScholarDigital Library
Michael D Ekstrand, F Maxwell Harper, Martijn C Willemsen, and Joseph A Konstan. 2014. User perception of differences in recommender algorithms. In Proceedings of the 8th ACM Conference on Recommender systems. 161–168.Google ScholarDigital Library
Bu Sung Kim, Heera Kim, Jaedong Lee, and Jee-Hyong Lee. 2014. Improving a recommender system by collective matrix factorization with tag information. In 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS). IEEE, 980–984.Google ScholarCross Ref
Denis Kotkov, Joseph A Konstan, Qian Zhao, and Jari Veijalainen. 2018. Investigating serendipity in recommender systems based on real user feedback. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 1341–1350.Google ScholarDigital Library
Denis Kotkov, Alexandr Maslov, and Mats Neovius. 2021. Revisiting the Tag Relevance Prediction Problem. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 1768–1772. https://doi.org/10.1145/3404835.3463019Google ScholarDigital Library
Paul Lamere. 2008. Social tagging and music information retrieval. Journal of new music research 37, 2 (2008), 101–114.Google ScholarCross Ref
Benedikt Loepp, Tim Donkers, Timm Kleemann, and Jürgen Ziegler. 2019. Interactive recommending with tag-enhanced matrix factorization (TagMF). International Journal of Human-Computer Studies 121 (2019), 21–41.Google ScholarCross Ref
Tien T Nguyen, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2018. User personality and user satisfaction with recommender systems. Information Systems Frontiers 20, 6 (2018), 1173–1189.Google ScholarDigital Library
Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web. ACM, 677–686.Google ScholarDigital Library
Tien T Nguyen, Daniel Kluver, Ting-Yu Wang, Pik-Mai Hui, Michael D Ekstrand, Martijn C Willemsen, and John Riedl. 2013. Rating support interfaces to improve user experience and recommender accuracy. In Proceedings of the 7th ACM Conference on Recommender Systems. 149–156.Google ScholarDigital Library
Martin F Porter. 1980. An algorithm for suffix stripping. Program (1980).Google Scholar
Tobias Schnabel, Paul N Bennett, Susan T Dumais, and Thorsten Joachims. 2018. Short-term satisfaction and long-term coverage: Understanding how users tolerate algorithmic exploration. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 513–521.Google ScholarDigital Library
Shilad Sen, F Maxwell Harper, Adam LaPitz, and John Riedl. 2007. The quest for quality tags. In Proceedings of the 2007 international ACM conference on Supporting group work. 361–370.Google ScholarDigital Library
Amos Tversky and Daniel Kahneman. 1974. Judgment under uncertainty: Heuristics and biases. science 185, 4157 (1974), 1124–1131.Google Scholar
Jesse Vig, Shilad Sen, and John Riedl. 2012. The Tag Genome: Encoding Community Knowledge to Support Novel Interaction. ACM Trans. Interact. Intell. Syst. 2, 3, Article 13 (Sept. 2012), 44 pages. https://doi.org/10.1145/2362394.2362395Google ScholarDigital Library
Thanh Vinh Vo and Harold Soh. 2018. Generation meets recommendation: proposing novel items for groups of users. In Proceedings of the 12th ACM Conference on Recommender Systems. 145–153.Google ScholarDigital Library
Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018, Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O’Donovan (Eds.). ACM, 86–94. https://doi.org/10.1145/3240323.3240369Google ScholarDigital Library
Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-Grained Spoiler Detection from Large-Scale Review Corpora. (2019), 2605–2610. https://doi.org/10.18653/v1/p19-1248Google Scholar
Nianwen Xue, Edward Bird, 2011. Natural language processing with python. Natural Language Engineering 17, 3 (2011), 419.Google ScholarDigital Library
Yuan Yao and F Maxwell Harper. 2018. Judging similarity: a user-centric study of related item recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 288–296.Google ScholarDigital Library

Index Terms

The Tag Genome Dataset for Books
1. Applied computing
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Recommender systems
  2. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Revisiting the Tag Relevance Prediction Problem
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Traditionally, recommender systems provide a list of suggestions to a user based on past interactions with items of this user. These recommendations are usually based on user preferences for items and generated with a delay. Critiquing recommender ...
Read More
Rating consistency is consistently underrated: an exploratory analysis of movie-tag rating inconsistency
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Content-based and hybrid recommender systems rely on item-tag ratings to make recommendations. An example of an item-tag rating is the degree to which the tag "comedy" applies to the movie "Back to the Future (1985)". Ratings are often generated by ...
Read More
The Tag Genome: Encoding Community Knowledge to Support Novel Interaction
Special Issue on Common Sense for Interactive Systems

This article introduces the tag genome, a data structure that extends the traditional tagging model to provide enhanced forms of user interaction. Just as a biological genome encodes an organism based on a sequence of genes, the tag genome encodes an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHIIR '22: Proceedings of the 2022 Conference on Human Information Interaction and Retrieval
March 2022
399 pages
ISBN:9781450391863
DOI:10.1145/3498366
General Chairs:
David Elsweiler
University of Regensburg, Bavaria, Germany
,
Udo Kruschwitz
University of Regensburg, Bavaria, Germany
,
Bernd Ludwig
University of Regensburg, Bavaria, Germany
Copyright © 2022 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2022
Check for updates
Author Tags
books
dataset
item-tag rating
recommender systems
tag genome
tag relevance
tagging
Qualifiers
- demonstration
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate55of163submissions,34%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 112
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

The Tag Genome Dataset for Books

CHIIR '22: Proceedings of the 2022 Conference on Human Information Interaction and Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Revisiting the Tag Relevance Prediction Problem

Rating consistency is consistently underrated: an exploratory analysis of movie-tag rating inconsistency

The Tag Genome: Encoding Community Knowledge to Support Novel Interaction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

The Tag Genome Dataset for Books

CHIIR '22: Proceedings of the 2022 Conference on Human Information Interaction and Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Revisiting the Tag Relevance Prediction Problem

Rating consistency is consistently underrated: an exploratory analysis of movie-tag rating inconsistency

The Tag Genome: Encoding Community Knowledge to Support Novel Interaction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media