research-article

How do People Sort by Ratings?

Authors:

Jerry O. Talton, III,

Konstantinos Koiliaris,

Ranjitha S. KumarAuthors Info & Claims

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Paper No.: 305, Pages 1 - 10

https://doi.org/10.1145/3290605.3300535

Published: 02 May 2019 Publication History

Abstract

Sorting items by user rating is a fundamental interaction pattern of the modern Web, used to rank products (Amazon), posts (Reddit), businesses (Yelp), movies (YouTube), and more. To implement this pattern, designers must take in a distribution of ratings for each item and define a sensible total ordering over them. This is a challenging problem, since each distribution is drawn from a distinct sample population, rendering the most straightforward method of sorting --- comparing averages --- unreliable when the samples are small or of different sizes. Several statistical orderings for binary ratings have been proposed in the literature (e.g., based on the Wilson score, or Laplace smoothing), each attempting to account for the uncertainty introduced by sampling. In this paper, we study this uncertainty through the lens of human perception, and ask "How do people sort by ratings?" In an online study, we collected 48,000 item-ranking pairs from 4,000 crowd workers along with 4,800 rationales, and analyzed the results to understand how users make decisions when comparing rated items. Our results shed light on the cognitive models users employ to choose between rating distributions, which sorts of comparisons are most contentious, and how the presentation of rating information affects users' preferences.

References

[1]

Dan Cosley, Shyong K. Lam, Istvan Albert, Joseph A. Konstan, and John Riedl. 2003. Is Seeing Believing?: How Recommender System Interfaces Affect Users' Opinions. In Proc. SIGCHI. 585--592.

Digital Library

[2]

F. Maxwell Harper, Xin Li, Yan Chen, and Joseph A. Konstan. 2005. An Economic Model of User Rating in an Online Recommender System. In Proc. UM. 307--316.

Digital Library

[3]

Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen, and John T. Riedl. 2004. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf. Syst. 22 (2004), 5--53.

Digital Library

[4]

Will Hill, Larry Stead, Mark Rosenstein, and George Furnas. 1995. Recommending and Evaluating Choices in a Virtual Community of Use. In Proc. CHI. 194--201.

Digital Library

[5]

Christopher K. Hsee, George F. Loewenstein, Sally Blount, and Max H. Bazerman. 1999. Preference reversals between joint and separate evaluation of options: A review and theoretical analysis. Psychological Bulletin 125, 5 (1999), 576--590. CHI 2019, May 4--9, 2019, Glasgow, Scotland Uk J. Talton et al.

[6]

Nan Hu, Jie Zhang, and Paul A. Pavlou. 2009. Overcoming the J-shaped Distribution of Product Reviews. CACM 52 (2009), 144--147.

Digital Library

[7]

Daniel Kahneman. 2011. Thinking, fast and slow. Farrar, Straus and Giroux, New York.

[8]

Daniel Kahneman and Amos Tversky. 1979. Prospect Theory: An Analysis of Decision under Risk. Econometrica 47, 2 (1979), 263--291.

[9]

Daniel Kluver, Tien T. Nguyen, Michael Ekstrand, Shilad Sen, and John Riedl. 2012. How Many Bits Per Rating?. In Proc. RecSys. 99--106.

Digital Library

[10]

Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval 3, 3 (2009), 225--331.

Digital Library

[11]

Nathan McAlone. 2017. The exec who replaced Netflix's 5-star rating system with 'thumbs up, thumbs down' explains why. http://www.businessinsider.com/ why-netflix-replaced-its-5-star-rating-system-2017--4

[12]

Evan Miller. 2009. How Not To Sort By Average Rating. http://www. evanmiller.org/how-not-to-sort-by-average-rating.html

[13]

Evan Miller. 2012. Bayesian Average Ratings. http://www.evanmiller. org/bayesian-average-ratings.html

[14]

Evan Miller. 2014. Ranking Items With Star Ratings. http://www. evanmiller.org/how-not-to-sort-by-average-rating.html

[15]

Michael P. O'Mahony, Neil J. Hurley, and Guénolé C.M. Silvestre. 2006. Detecting Noise in Recommender System Databases. In Proc. IUI. 109-- 115.

Digital Library

[16]

Will Qiu, Palo Parigi, and Bruno Abrahao. 2018. More Stars or More Reviews?. In Proc. CHI. 153:1--153:11.

Digital Library

[17]

Al Mamunur Rashid, Istvan Albert, Dan Cosley, Shyong K. Lam, Sean M. McNee, Joseph A. Konstan, and John Riedl. 2002. Getting to Know You: Learning New User Preferences in Recommender Systems. In Proc. IUI. 127--134.

Digital Library

[18]

Alan Said and Alejandro Bellogín. 2018. Coherence and Inconsistencies in Rating Behavior: Estimating the Magic Barrier of Recommender Systems. UMUAI 28 (2018), 97--125.

Digital Library

[19]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommendation Algorithms. In Proc. WWW. 285--295.

Digital Library

[20]

Aaron Schumacher. 2014. How To Sort By Average Rating. https://planspacedotorg.wordpress.com/2014/08/17/ how-to-sort-by-average-rating/

[21]

Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, and Jose San Pedro. 2010. How Useful Are Your Comments?: Analyzing and Predicting Youtube Comments and Comment Ratings. In Proc. WWW. 891--900.

Digital Library

[22]

E. Isaac Sparling and Shilad Sen. 2011. Rating: How Difficult is It?. In Proc. RecSys. 149--156.

Digital Library

[23]

Jacob Thebault-Spieker, Daniel Kluver, Maximilian A. Klein, Aaron Halfaker, Brent Hecht, Loren Terveen, and Joseph A. Konstan. 2017. Simulation Experiments on (the Absence of) Ratings Bias in Reputation Systems. In Proc. CSCW. 101:1--101:25.

[24]

Amos Tversky and Daniel Kahneman. 1985. The Framing of Decisions and the Psychology of Choice. Springer US, Boston, MA, 25--41.

[25]

Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference. J. Amer. Statist. Assoc. 22, 158 (1927), 209--212.

[26]

Timothy Wilson and Jonathan Schooler. 1991. Thinking Too Much: Introspection Can Reduce the Quality of Preferences and Decisions. Journal of personality and social psychology 60 (03 1991), 181--92.

[27]

Dell Zhang, Robert Mao, Haitao Li, and Joanne Mao. 2011. How to Count Thumb-Ups and Thumb-Downs: User-Rating Based Ranking of Items from an Axiomatic Perspective. In Proc ICTIR. 238--249.

Digital Library

Cited By

Noh YJeon JHong J(2023)Understanding of Customer Decision-Making Behaviors Depending on Online ReviewsApplied Sciences10.3390/app1306394913:6(3949)Online publication date: 20-Mar-2023
https://doi.org/10.3390/app13063949
Haya ASafitry C(2023)Implementation of Wilson Score on Personal Information Distribution for Privacy-Focused Contact ManagementProceedings of the International Conference on Educational Management and Technology (ICEMT 2022)10.2991/978-2-494069-95-4_55(464-478)Online publication date: 10-Feb-2023
https://doi.org/10.2991/978-2-494069-95-4_55
Li HHecht B(2021)3 Stars on Yelp, 4 Stars on Google MapsProceedings of the ACM on Human-Computer Interaction10.1145/34329534:CSCW3(1-25)Online publication date: 5-Jan-2021
https://dl.acm.org/doi/10.1145/3432953
Show More Cited By

Index Terms

How do People Sort by Ratings?
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics
RecSys '12: Proceedings of the sixth ACM conference on Recommender systems

The evaluation of recommender systems in terms of ranking has recently gained attention, as it seems to better fit the top-k recommendation task than the usual ratings prediction task. In that context, several authors have proposed to consider missing ...
EigenRank: a ranking-oriented approach to collaborative filtering
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

A recommender system must be able to suggest items that are likely to be preferred by the user. In most systems, the degree of preference is represented by a rating score. Given a database of users' past ratings on a set of items, traditional ...
Pairwise preference regression for cold-start recommendation
RecSys '09: Proceedings of the third ACM conference on Recommender systems

Recommender systems are widely used in online e-commerce applications to improve user engagement and then to increase revenue. A key challenge for recommender systems is providing high quality recommendation to users in ``cold-start" situations. We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

May 2019

9077 pages

ISBN:9781450359702

DOI:10.1145/3290605

General Chairs:
Stephen Brewster
University of Glasgow, Scotland, UK
,
Geraldine Fitzpatrick
TU Wien, Austria
,
Program Chairs:
Anna Cox
University College London, UK
,
Vassilis Kostakos
University of Melbourne, Australia

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '19

Sponsor:

SIGCHI

CHI '19: CHI Conference on Human Factors in Computing Systems

May 4 - 9, 2019

Glasgow, Scotland Uk

Acceptance Rates

CHI '19 Paper Acceptance Rate 703 of 2,958 submissions, 24%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
495
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)3

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Noh YJeon JHong J(2023)Understanding of Customer Decision-Making Behaviors Depending on Online ReviewsApplied Sciences10.3390/app1306394913:6(3949)Online publication date: 20-Mar-2023
https://doi.org/10.3390/app13063949
Haya ASafitry C(2023)Implementation of Wilson Score on Personal Information Distribution for Privacy-Focused Contact ManagementProceedings of the International Conference on Educational Management and Technology (ICEMT 2022)10.2991/978-2-494069-95-4_55(464-478)Online publication date: 10-Feb-2023
https://doi.org/10.2991/978-2-494069-95-4_55
Li HHecht B(2021)3 Stars on Yelp, 4 Stars on Google MapsProceedings of the ACM on Human-Computer Interaction10.1145/34329534:CSCW3(1-25)Online publication date: 5-Jan-2021
https://dl.acm.org/doi/10.1145/3432953
Pater JCoupe APfafman RPhelan CToscos TJacobs MKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Standardizing Reporting of Participant Compensation in HCI: A Systematic Literature Review and Recommendations for the FieldProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445734(1-16)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445734
Lyu YGao FWu ILim B(2021)Imma Sort by Two or More Attributes With Interpretable Monotonic Multi-Attribute SortingIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2020.304348727:4(2369-2384)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TVCG.2020.3043487
Kim MKim YLee JBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Utilizing Response Time to Find In-between Ratings within Likes and DislikesExtended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3334480.3383049(1-7)Online publication date: 25-Apr-2020
https://dl.acm.org/doi/10.1145/3334480.3383049

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten