Gender, writing and ranking in review forums: a case study of the IMDb

Otterbacher, Jahna

doi:10.1007/s10115-012-0548-z

Gender, writing and ranking in review forums: a case study of the IMDb

Regular Paper
Published: 18 September 2012

Volume 35, pages 645–664, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jahna Otterbacher¹

2019 Accesses
19 Citations
Explore all metrics

Abstract

Online review forums provide consumers with essential information about goods and services by facilitating word-of-mouth communication. Despite that preferences are correlated to demographic characteristics, reviewer gender is not often provided on user profiles. We consider the case of the internet movie database (IMDb), where users exchange views on movies. Like many forums, IMDb employs collaborative filtering such that by default, reviews are ranked by perceived utility. IMDb also provides a unique gender filter that displays an equal number of reviews authored by men and women. Using logistic classification, we compare reviews with respect to writing style, content and metadata features. We find salient differences in stylistic features and content between reviews written by men and women, as predicted by sociolinguistic theory. However, utility is the best predictor of gender, with women’s reviews perceived as being much less useful than those written by men. While we cannot observe who votes at IMDb, we do find that highly rated female-authored reviews exhibit “male” characteristics. Our results have implications for which contributions are likely to be seen, and to what extent participants get a balanced view as to “what others think” about an item.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring author gender in book rating and recommendation

Article 04 February 2021

Michael D. Ekstrand & Daniel Kluver

Exploring and mitigating gender bias in book recommender systems with explicit feedback

Article 25 March 2024

Shrikant Saxena & Shweta Jain

The Social Aspect of Voting for Useful Reviews

Notes

Following [52], we define utility as the number of users who found a review useful divided by the total number of votes received (i.e., x/y).
While there are other movie review corpora available (e.g., for studying sentiment analysis), we were not able to find existing data with author gender.
Unlike in OLS regression, the pseudo \(R^2\) cannot be interpreted as the proportion of variance in the independent variable that is explained by the model; it is a simple measure of the strength of association between the predictors and the independent variable. Therefore, it is a useful guide in choosing an appropriate model, but has no literal interpretation.
http://www.nytimes.com/roomfordebate/2011/02/02/ where-are-the-women-in-wikipedia

References

Acquisti A, Gross R (2006) Imagined communities: awareness, information sharing, and privacy on Facebook. In: Privacy enhancing technologies. Lecture notes in computer science, Springer, Berlin
Ahmed A, Low Y, Aly M, Josifovski V, Smola A (2011) Scalable distributed inference of dynamic user interests for behavioral targeting. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 114–122
Argamon S, Koppel M, Fine J, Shimoni AR (2003) Gender, genre, and writing style in formal written texts, Text 23
Argamon S, Koppel M, Pennebaker JW, Schler J (2007) Mining the blogosphere: age, gender and the varieties of self-expression. First Monday 12(9). http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2003/1878
Awad NF, Ragowsky A (2008) Establishing trust in electronic commerce through online word of mouth: an examination across genders. J Manag Inf Syst 24(4):101–121
Article Google Scholar
Bruckman A (1996) Gender swapping on the internet. In: Ludlow P (ed) High noon on the electronic frontier: conceptual issues in cyberspace. MIT Press, Cambridge
Google Scholar
Chung C, Pennebaker JW (2008) Revealing dimensions of thinking in open-ended self-descriptions: an automated meaning extraction method for natural language. J Res Pers 42:96–132
Article Google Scholar
Coates J (1993) Women, men, and language. Longman, London
Google Scholar
Danescu-Niculescu-Mizil C, Kossinets G, Kleinberg J, Lee L (2009) How opinions are received by online communities: a case study on Amazon.com helpfulness votes. In: Proceedings of the international world wide web conference, Madrid, Spain, pp 141–150
Dellarocas C (2003) The digitization of word of mouth: promise and challenges of online feedback mechanisms. Manag Sci 49(10):1407–1424
Article Google Scholar
Foltz PW, Laham D, Landauer TK (1999) Automated essay scoring: applications to educational technology. In: Proceedings of world conference on educational multimedia, hypermedia and telecommunications, Chesapeake, pp 939–944
Gefen D, Ridings CM (2005) If you spoke as she does, sir, instead of the way you do: a sociolinguistics perspective of gender differences in virtual communities. ACM SIGMIS Database 3(2):78–92
Article Google Scholar
Ghose A, Ipeirotis PG (2010) Estimating the helpfulness and economic impact of product reviews: mining text and reviewer characteristics. IEEE Trans Knowl Data Eng 1498–1512
Glott R, Ghosh R, Schmidt P (2010) Analysis of Wikipedia survey UNU-MERIT http://www.wikipediasurvey.org/docs/Wikipedia_Overview_15March2010-FINAL.pdf
Herring SC (1996) Posting in a different voice: gender and ethics in computer-mediated communication. In: Ess C (ed) Philosophical perspectives on computer-mediated communication. SUNY Press, New York, pp 115–145
Google Scholar
Herring SC (1996b) Two variants of an electronic message schema. In: Herring SC (ed) Computer-mediated communication: linguistic, social and cross-cultural perspectives. John Benjamins, New York, pp 81–108
Google Scholar
Herring SC (2003) Gender and power in online communication. In: Holmes J, Meyerhoff M (eds) The handbook of language and gender. Blackwell, Oxford, pp 202–228
Chapter Google Scholar
Hirschman EC, Holbrook MB (1982) Hedonic consumption: emerging concepts, methods and propositions. J Mark 46(Summer):92–101
Article Google Scholar
Holbrook MB, Schindler RM (1994) Age, sex and attitude toward the past as predictors of consumers’ aesthetic tastes for cultural products. J Mark Res 31:412–422
Article Google Scholar
Hu J, Zeng H-J, Li H, Niu C, Chen Z (2007) Demographic prediction based on user’s browsing behavior. In: Proceedings of the ACM WWW Banff, Albert, May 2007, pp 151–160
Joachims T, Granka L, Pan B, Humbrooke H, Radlinski G, Gay G (2007) Evaluating the accuracy of implicit feedback from clicks and query reformation in web search. ACM Trans Inf Syst 25(2). http://doi.acm.org/10.1145/1229179.1229181
Kaiser HF (1960) The application of electronic computers to factor analysis. Educ Psychol Meas 20:141–151
Article Google Scholar
Koppel M, Argamon S, Shimoni AR (2003) Automatically categorizing written texts by author gender. Lit Linguist Comput 17(4):401–412
Google Scholar
Kostakos V (2009) Is the crowd’s wisdom biased? A quantitative analysis of three online communities. In: Proceedings of IEEE social communication, international symposium on social intelligence and networking, Vancouver, Canada, pp 251–255
Lakoff G (1973) Hedges: a study in meaning criteria and the logic of fuzzy concepts. J Philos Log 2(4):458–508
Article MathSciNet MATH Google Scholar
Lakoff R (1973) Language and woman’s place. Lang Soc 2:45–79
Article MathSciNet Google Scholar
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25:259–284
Article Google Scholar
Liu J, Cao Y, Lin C-Y, Huang Y, Zhou M (2007) Low-quality product review detection in opinion summarization. In: Proceedings of the conference on empirical methods in natural language processing, pp 334–342
Loehlin JC (1992) Latent variable models. Lawrence Erlbaum Associates, London
Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50–60
Article MathSciNet MATH Google Scholar
Manning CD, Schutze H (2000) Foundations of statistical natural language processing. MIT Press, Cambridge
Google Scholar
Menard S (2002) Applied logistic regression analysis. Quantitative applications in the social sciences. Sage University Press, Beverley Hills
Google Scholar
Mitchell T (1997) Machine learning. McGraw Hill, New York
MATH Google Scholar
Muhlestein D, Lim S (2011) Online learning with social computing based interest sharing. Knowl Inf Syst 26:31–58
Article Google Scholar
Oliver MB, Weaver JB III (2000) An examination of factors related to sex differences in enjoyment of sad films. J Broadcast Electron Media 44(2):282–300
Article Google Scholar
Popescu A, Grefenstette G (2010) Mining user home location and gender from Flickr tags. In: Proceedings of the 4th international conference on weblogs and social media, Washington, DC, May 2010
Radev D, Jing H, Stys M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40:919–938
Article MATH Google Scholar
Roth M, Ben-David A, Deutscher D, Flysher G, Horn I, Leichtberg A, Leiser N, Matias Y, Merom R (2010) Suggesting friends using the implicit social graph. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, pp 233–242
Sahlgren M, Karlgren J (2009) Terminology mining in social media. In: Proceedings of the ACM conference on information and knowledge management, Hong Kong, Nov 2009, pp 405–414
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, Inc., New York
Google Scholar
Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. In: Proceedings of SIGIR, pp 253–260
Schindler RM, Bickart B (2005) Published word of mouth: referable, consumer-generated information on the Internet. In: Hauvgedt C, Machleit K, Yalch R (eds) Online consumer psychology: understanding and influencing behavior in the virtual world. Lawrence Erlbaum Associates, London, pp 35–61
Google Scholar
Spender D (1989) The writing or the sex or why you don’t have to read women’s writing to know it’s no good. Elsevier, Oxford
Google Scholar
Stamatatos E, Kokkinakis G, Fakotakis N (2000) Automatic text categorization in terms of genre and author. Comput Linguist 26(4):471–495
Article Google Scholar
Stutzman F (2006) An evaluation of identity-sharing behavior in social network communities. Intern Digit Media Arts J 3(1):10–18
Google Scholar
Tannen D (1990) You just don’t understand. HarperCollins Publishers, Inc., New York
Google Scholar
Terveen L, McDonald DW (2005) Social matching: a framework and research agenda. ACM Trans Compu Hum Interact 12(3):401–434
Article Google Scholar
Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, pp 168–176
Wang D, Tse QCK, Zhou Y (2011) A decentralized search engine for dynamic Web communities. Knowl Inf Syst 26:105–125
Article Google Scholar
Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr J 1(1):69–90
Article Google Scholar
Zhai Z, Liu B, Xu H, Jia P (2011) Clustering product features for opinion mining.In: Proceedings of the fourth ACM international conference on Web search and data mining, Hong Kong, pp 347–354
Zhang Z, Varadarajan B (2006) Utility scoring of product reviews. In: Proceedings of the ACM conference on information and knowledge management, Arlington, pp 51–57

Download references

Acknowledgments

We thank the anonymous reviewers who provided helpful feedback on this work, as well as the reviewers of an earlier version of this work, which appeared at ACM CIKM 2010. We also acknowledge the insightful advice of Alexia Panayiotou, as well as Mengyuan (Serena) Li’s assistance with data collection.

Author information

Authors and Affiliations

Department of Humanities, Illinois Institute of Technology, Chicago, IL, 60616, USA
Jahna Otterbacher

Authors

Jahna Otterbacher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jahna Otterbacher.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Otterbacher, J. Gender, writing and ranking in review forums: a case study of the IMDb. Knowl Inf Syst 35, 645–664 (2013). https://doi.org/10.1007/s10115-012-0548-z

Download citation

Received: 18 February 2011
Revised: 05 April 2012
Accepted: 22 August 2012
Published: 18 September 2012
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10115-012-0548-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gender, writing and ranking in review forums: a case study of the IMDb

Abstract

Access this article

Similar content being viewed by others

Exploring author gender in book rating and recommendation

Exploring and mitigating gender bias in book recommender systems with explicit feedback

The Social Aspect of Voting for Useful Reviews

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gender, writing and ranking in review forums: a case study of the IMDb

Abstract

Access this article

Similar content being viewed by others

Exploring author gender in book rating and recommendation

Exploring and mitigating gender bias in book recommender systems with explicit feedback

The Social Aspect of Voting for Useful Reviews

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation