Skip to main content

Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized Mitigations

  • Conference paper
  • First Online:
Information for a Better World: Normality, Virtuality, Physicality, Inclusivity (iConference 2023)

Abstract

The growing quantity of user-generated book reviews has opened up unprecedented opportunities for empirical research on books, reading, and readership. While there is an abundance of literature addressing the legal and ethical use of user-generated and social media data in general, for user-generated book reviews, such discussions have been mostly absent. From a library and information sciences perspective, user-generated book reviews can pose novel challenges because each book reviewer may simultaneously be (1) a presumably anonymous and safe online user; and, (2) an identifiable reader who can suffer real harm, e.g., cyber doxing and personal attack. This user/reader duality can create conflicting recommendations regarding which legal or ethical guidelines to follow. According to our review, potential legal issues include copyright infringement and violations of terms of service/end-user license agreements and privacy rights, while ethical concerns are centered on users’ expectations, informed consent, and institutional reviews. This paper reviews (1) potential legal and ethical pitfalls in leveraging user-generated book reviews; and, (2) professional and scholarly references that might serve as useful guidelines to avoid or manage these pitfalls.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the context of this paper, user-generated book reviews include not only actual book reviews but also numerical ratings, crowdsourced tags, user-curated book lists, virtual collections of books, graphic content, etc.

  2. 2.

    For example, book reviews may contain user names that overlap with real names, email addresses, identifying parts of addresses, or workplaces.

  3. 3.

    Wattpad is a storytelling and social reading platform based in Canada [160].

  4. 4.

    EULA is a contract between the licensor and the licensee, which establishes the licensee’s right to use a proprietary product. TOS refers to a contract between a provider and a user which defines the rules that a user should follow in order to use a service. In our research contexts, we consider them interchangeable terms, as both of them specify the permissions and prohibitions for using the book review platforms’ service, products, and/or data.

  5. 5.

    The HathiTrust is a consortium of several hundred academic libraries that have collaborated (with scanning agencies like Google) to create a massive digital library [15, 61].

  6. 6.

    The Internet Archive is a large digital library that preserves and provides digitized content to the public [154].

  7. 7.

    Due to length constraints of this paper, we only discussed some of the articles that we reviewed for this paper. The full list of references is available at https://github.com/Yuerong2/iConference2023appendix/blob/main/iconference2023referencesAppendix.pdf. Our literature review is limited to empirical research on user-generated book reviews based on computational and/or qualitative methods. We did not consider theoretical work on user-generated book reviews without empirical data involved.

  8. 8.

    Amazon (Amazon.com: Books) is currently the largest online bookseller worldwide. Goodreads is one of the dominant social reading and book review platforms based in the United States, with 90 million registered members as of 2019. LibraryThing is one of the most impactful social cataloging platforms based in the United States, with 2.6 million users as of 2021 [54, 73, 95, 156,157,158,159].

  9. 9.

    In some publications, data collection methods are not explicitly specified, and general terms like “got”, “collected”, “downloaded” and “extracted” are used in lieu of providing more detailed collection method descriptions. [4, 27, 36, 64, 139].

  10. 10.

    Such considerations might not apply to studies on user-generated data. We elaborate on this issue in Sect. 3.4.

  11. 11.

    In the United States, an Institutional Review Board (IRB) is an administrative unit formally designated to review and monitor research activities using human research subjects. IRBs approve or disapprove research proposals prior to their initiation to ensure the rights and welfare of human research subjects [144].

  12. 12.

    Due to copyright and perform restrictions, it is recommendable to share only unique key identifiers for collected data items instead of actual datasets such that other researchers can rehydrate the data, which bears the risk of collecting incomplete datasets [32, 109].

  13. 13.

    The US copyright law demands consideration of four factors for determining whether fair use is applicable: purpose and character of the use; nature of the copyrighted work; amount and substantiality of the portion used; and the effect of the use upon the potential market for the copyrighted work. For research based on user-generated book reviews, the first two conditions of fair use may be less of a concern, but researchers should pay more attention to the third and fourth conditions.

  14. 14.

    “Transformative use” of the data alters original content to give it “new expression, meaning or message” [133]. “Non-consumptive use” refers to computer-assisted research, which has been found not to conflict with copyright holders’ interests. For instance, in transformative and non-consumptive research, digital humanities scholars can conduct computational text analysis of millions of books (copyrighted books included) without actually reading or re-disseminating (i.e., without human “consumption” of) any expressive content of those books [113].

  15. 15.

    It should be noted that “these state laws, however, are overridden or trumped by federal laws that allow federal agencies to seek library records” [21, 90]. They vary by state, however, they reflect a consensus that library users’ data are confidential and should only be disclosed under certain circumstances (e.g., with the user’s informed consent, under a court order, etc.).

  16. 16.

    Robots.txt files are developed and used primarily to inform search engines and web scrapers whether data on a webpage is prohibited or permitted for harvesting. They are widely adopted by the websites to regulate scraping, although their prohibitions “fall into a legal grey area” [123].

  17. 17.

    Accessed in August 2022.

  18. 18.

    In this case, hiQ scraped publicly available user data from LinkedIn’s website to supply its own business, in spite of LinkedIn’s no-data-scraping policies, letters specifically addressed to hiQ, and technical measures enacted against hiQ. LinkedIn claimed that hiQ’s scraping violated the CFAA, the Digital Millennium Copyright Act, and state trespass law, while hiQ denied these claims and asserted its right to scrape publicly accessible data [53].

  19. 19.

    However, in practice, it is difficult for researchers to verify whether the reviewers are indeed aware of the public accessibility of their data. Researchers should not make assumptions about users’ awareness.

  20. 20.

    Kosinski and colleagues argue that no consent is needed and user-generated online data can be conceptualized as archival data if (1) users consciously made their data public; (2) data collected is anonymized; (3) researchers do not interact with participants; and, (4) no identifiable user information is published. [87].

  21. 21.

    Different IRBs might make different decisions on requests for exemption based on specific research proposals. For instance, we learned from our own research experience that analysis of publicly available and de-identified book review data without any interaction with the reviewers is mostly likely to be considered “Not Human Subjects Research” (NHSR) by the IRB at our home institution [142]. In this case, researchers who believe their work does not require IRB review or oversight should submit a request to their institution’s IRB for a designation as Not Human Subjects Research. They might also consider asking for an Exempt Status determination, in which case they are performing Human Subjects research but are exempt from regular oversight.

References

  1. ACM Code 2018 Task Force: ACM code of ethics and professional conduct (2018). https://www.acm.org/code-of-ethics

  2. ACM Technology Policy Council, ACM Europe Technology Policy Committee and ACM US Technology Policy Council: Statement on principles for responsible algorithmic systems (2022). https://www.acm.org/binaries/content/assets/public-policy/final-joint-ai-statement-update.pdf

  3. Acquisti, A., Brandimarte, L., Loewenstein, G.: Privacy and human behavior in the age of information. Science 347(6221), 509–514 (2015)

    Google Scholar 

  4. Albrechtslund, A.M.B.: Negotiating ownership and agency in social media: community reactions to amazon’s acquisition of Goodreads. First Monday (2017)

    Google Scholar 

  5. American Civil Liberties Union: Federal court rules ‘big data’ discrimination studies do not violate federal anti-hacking law (2020). https://www.aclu.org/press-releases/federal-court-rules-big-data-discrimination-studies-do-not-violate-federal-anti

  6. American Library Association: The USA patriot act (2009). https://www.ala.org/ala/washoff/WOissues/civilliberties/theusapatriotact/usapatriotact.htm

  7. American Library Association: Intellectual freedom: issues and resources (2017). https://www.ala.org/advocacy/intfreedom

  8. American Library Association: Ala statement on book censorship (2021). https://www.ala.org/advocacy/statement-regarding-censorship

  9. American Library Association: State privacy laws regarding library records (2021). https://www.ala.org/advocacy/privacy/statelaws

  10. American Library Association Council: Policy concerning confidentiality of personally identifiable information about library users (1991). https://www.ala.org/advocacy/intfreedom/statementspols/otherpolicies/policyconcerning

  11. Annette Markham and Elizabeth Buchanan: Ethical decision-making and internet research: recommendations from the AoIR ethics working committee (version 2.0) (2012). https://aoir.org/reports/ethics2.pdf

  12. Antoniak, M., Walsh, M., Mimno, D.: Tags, borders, and catalogs: social re-working of genre on librarything. Proc. ACM Hum.-Comput. Interact. 5(CSCW1), 1–29 (2021)

    Google Scholar 

  13. Asher, A., et al.: Ethics in research use of library patron data: glossary and explainer (2018). https://doi.org/10.17605/OSF.IO/XFKZ6

  14. Association for Computing Machinery: Scraping by: reconsidering law & technology for online data collection - 19 May 2022 (2022). https://www.acm.org/public-policy/ustpc/hottopics/online-data-collection

  15. Band, J.: LCA comments on authors guild v. hathitrust decision (2012). https://www.arl.org/news/lca-comments-on-authors-guild-v-hathitrust-decision/

  16. Bartley, P.: Book tagging on LibraryThing: how, why, and what are in the tags? Proc. Am. Soc. Inf. Sci. Technol. 46(1), 1–22 (2009)

    Google Scholar 

  17. BBC News: Author Richard Brittain attacked reviewer with bottle (2015). https://www.bbc.com/news/uk-scotland-edinburgh-east-fife-34775814

  18. Böhme, R., Köpsell, S.: Trained to accept? A field experiment on consent dialogs. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2403–2406 (2010)

    Google Scholar 

  19. Boot, P., Koolen, M.: Captivating, splendid or instructive?: assessing the impact of reading in online book reviews. Sci. Study Lit. 10(1), 35–63 (2020)

    Google Scholar 

  20. Bourrier, K., Thelwall, M.: The social lives of books: reading Victorian literature on goodreads. J. Cult. Anal. 1(1), 12049 (2020)

    Google Scholar 

  21. Bowers, S.L.: Privacy and library records. J. Acad. Librariansh. 32(4), 377–383 (2006)

    Google Scholar 

  22. Bruckman, A.: Studying the amateur artist: a perspective on disguising data collected in human subjects research on the internet. Ethics Inf. Technol. 4(3), 217–231 (2002)

    Google Scholar 

  23. California Legislative Information: Title 1.81.5. California consumer privacy act of 2018 (2018). https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?division=3. &part=4. &lawCode=CIV &title=1.81.5

  24. Carman, N.: LibraryThing tags and Library of Congress Subject Headings: A comparison of science fiction and fantasy works. School of Information Management at Victoria University of Wellington (2009)

    Google Scholar 

  25. Chang, K., et al.: Book reviews and the consolidation of genre. In: DH2020 (ADHO) Proceedings (2020). http://dx.doi.org/10.17613/02q2-1v27

  26. Chen, P.Y., Dhanasobhon, S., Smith, M.D.: All reviews are not created equal: the disaggregate impact of reviews and reviewers at amazon.com (2008)

    Google Scholar 

  27. Chevalier, J.A., Mayzlin, D.: The effect of word of mouth on sales: online book reviews. J. Mark. Res. 43(3), 345–354 (2006)

    Google Scholar 

  28. Court of Appeal, Second District, Division 3, California.: Long v. Provide Commerce Inc (2016). https://caselaw.findlaw.com/ca-court-of-appeal/1729412.html

  29. Crawford, K., Finn, M.: The limits of crisis data: analytical and ethical challenges of using social and mobile data to understand disasters. GeoJournal 80(4), 491–502 (2015)

    Google Scholar 

  30. Computer Crime and Intellectual Property Section Criminal Division: Prosecuting computer crimes manual (2010). https://www.justice.gov/criminal/file/442156/download

  31. Dai, L.: From the history of the book to the history of reading: theories and methods for historical studies of reading. Xinxing (2017)

    Google Scholar 

  32. De Greve, L., Martens, G.: # bookstagram and beyond: the presence and depiction of the Bachmann literary prize on social media (2007–2017). Digit. Humanit. Benelux J. 3, 81–102 (2021)

    Google Scholar 

  33. Diesner, J., Chin, C.: Seeing the forest for the trees: considering applicable types of regulation for the responsible collection and analysis of human centered data. In: Human-Centered Data Science (HCDS) Workshop at 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing (2016)

    Google Scholar 

  34. Diesner, J., Chin, C.L.: Usable ethics: practical considerations for responsibly conducting research with social trace data. In: Proceedings of Beyond IRBs: Ethical Review Processes for Big Data Research (2015)

    Google Scholar 

  35. Diesner, J., Chin, C.L.: Gratis, libre, or something else? Regulations and misassumptions related to working with publicly available text data. In: Actes du Workshop on Ethics In Corpus Collection, Annotation & Application (ETHI-CA2), LREC, Portoroz, Slovénie (2016)

    Google Scholar 

  36. Dimitrov, S., Zamal, F., Piper, A., Ruths, D.: Goodreads versus amazon: the effect of decoupling book reviewing and book selling. In: Ninth International AAAI Conference on Web and Social Media (2015)

    Google Scholar 

  37. Drew, C.: Data science ethics in government. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 374(2083), 20160119 (2016)

    Google Scholar 

  38. Driscoll, B., Rehberg Sedo, D.: Faraway, so close: seeing the intimacy in goodreads reviews. Qual. Inq. 25(3), 248–259 (2019)

    Google Scholar 

  39. Driscoll, B., Rehberg Sedo, D.: The transnational reception of bestselling books between Canada and Australia. Global Media Commun. 16(2), 243–258 (2020)

    Google Scholar 

  40. Ehrmann, T., Schmale, H.: The hitchhiker’s guide to the long tail: the influence of online-reviews and product recommendations on book sales-evidence from German online retailing. In: ICIS 2008 Proceedings, p. 157 (2008)

    Google Scholar 

  41. Ellis, D.: What charles and anti-charles reveal about goodreads homophobia (2020). https://bookriot.com/goodreads-homophobia/

  42. English, J., Ungar, L., Dhakecha, R.H., Scott, E.: Mining goodreads (literary reception studies at scale) (2018). https://pricelab.sas.upenn.edu/projects/goodreads-project

  43. Estabrook, L.S.: Sacred trust or competitive opportunity: using patron records. Libr. J. 121(2), 48–49 (1996)

    Google Scholar 

  44. European Union (EU): Complete guide to GDPR (general data protection regulation) compliance (2016). https://gdpr.eu/

  45. Fiesler, C.: Ethical considerations for research involving (speculative) public data. Proc. ACM Hum.-Comput. Interact. 3(GROUP), 1–13 (2019)

    Google Scholar 

  46. Fiesler, C., Beard, N., Keegan, B.C.: No robots, spiders, or scrapers: legal and ethical regulation of data collection methods in social media terms of service. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, pp. 187–196 (2020)

    Google Scholar 

  47. Fiesler, C., Lampe, C., Bruckman, A.S.: Reality and perception of copyright terms of service for online content creation. In: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pp. 1450–1461 (2016)

    Google Scholar 

  48. Fiesler, C., Proferes, N.: “Participant” perceptions of twitter research ethics. Soc. Media+ Soc. 4(1), 2056305118763366 (2018)

    Google Scholar 

  49. Fiesler, C.: Law & ethics of scraping: what HiQ v Linkedin could mean for researchers violating TOS (2017). https://cfiesler.medium.com/law-ethics-of-scraping-what-hiq-v-linkedin-could-mean-for-researchers-violating-tos-787bd3322540

  50. Fornaciari, T., Poesio, M.: Identifying fake amazon reviews as learning from crowds. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 279–287. Association for Computational Linguistics (2014)

    Google Scholar 

  51. Franzke, A.S., Bechmann, A., Zimmer, M., Ess, C.: Internet research ethics guidelines (IRE 3.0 6.1) (2019). https://aoir.org/reports/ethics3.pdf

  52. Gilbert, E., Karahalios, K.: Understanding deja reviewers. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 225–228 (2010)

    Google Scholar 

  53. Goldfein, S., Keyte, J.: Big data, web ‘scraping’ and competition law: the debate continues. New York Law J. 258(49), 1–3 (2017)

    Google Scholar 

  54. Goodreads: About goodreads (2022). https://www.goodreads.com/about/us

  55. Goodreads: Goodreads robots.txt file (2022). https://www.goodreads.com/robots.txt

  56. Goodreads: Terms of use (2022). https://www.goodreads.com/about/terms

  57. Gray, J., Foong, C.: Publishers vs the internet archive: why the world’s biggest online library is in court over digital book lending (2022). https://theconversation.com/publishers-vs-the-internet-archive-why-the-worlds-biggest-online-library-is-in-court-over-digital-book-lending-187166

  58. Greene, D., Hoffmann, A.L., Stark, L.: Better, nicer, clearer, fairer: a critical assessment of the movement for ethical artificial intelligence and machine learning. In: Proceedings of the Annual Hawaii International Conference on System Sciences, pp. 2122–2131 (2019)

    Google Scholar 

  59. Guan, X., Li, Y., Gong, H., Sun, H., Zhou, C.: An improved SVM for book review sentiment polarity analysis. In: 2018 International Conference on Transportation Logistics, Information Communication, Smart City (TLICSC 2018). Atlantis Press (2018)

    Google Scholar 

  60. Hajibayova, L.: Investigation of goodreads’ reviews: kakutanied, deceived or simply honest? J. Doc. 75(3), 612–626 (2019)

    Google Scholar 

  61. HathiTrust Digital Library: Our digital library (2022). https://www.hathitrust.org/digital_library

  62. He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web, pp. 507–517 (2016)

    Google Scholar 

  63. Holur, P., Shahsavari, S., Ebrahimzadeh, E., Tangherlini, T.R., Roychowdhury, V.: Modelling social readers: novel tools for addressing reception from online book reviews. Roy. Soc. Open Sci. 8(12), 210797 (2021)

    Google Scholar 

  64. Hong, H., Xu, D., Xu, D., Wang, G.A., Fan, W.: An empirical study on the impact of online word-of-mouth sources on retail sales. Inf. Discov. Deliv. 45(1), 30–35 (2017)

    Google Scholar 

  65. Howison, J., Wiggins, A., Crowston, K.: Validity issues in the use of social network analysis with digital trace data. J. Assoc. Inf. Syst. 12(12), 2 (2011)

    Google Scholar 

  66. Howsam, L.: Old Books and New Histories: An Orientation to Studies in Book and Print Culture. University of Toronto Press, Toronto (2006)

    Google Scholar 

  67. Hu, N., Bose, I., Gao, Y., Liu, L.: Manipulation in digital word-of-mouth: a reality check for book reviews. Decis. Support Syst. 50(3), 627–635 (2011)

    Google Scholar 

  68. Hu, N., Bose, I., Koh, N.S., Liu, L.: Manipulation of online reviews: an analysis of ratings, readability, and sentiments. Decis. Support Syst. 52(3), 674–684 (2012)

    Google Scholar 

  69. Hu, N., Koh, N.S., Reddy, S.K.: Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales. Decis. Support Syst. 57, 42–53 (2014)

    Google Scholar 

  70. Hu, N., Liu, L., Sambamurthy, V.: Fraud detection in online consumer reviews. Decis. Support Syst. 50(3), 614–626 (2011)

    Google Scholar 

  71. Hu, N., Liu, L., Zhang, J.J.: Do online reviews affect product sales? The role of reviewer characteristics and temporal effects. Inf. Technol. Manag. 9(3), 201–214 (2008)

    Google Scholar 

  72. Hu, Y.: Synthesizing digital libraries and digital humanities perspectives for illuminating under-investigated complexities associated with user-generated book reviews. In: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries, pp. 1–2 (2022)

    Google Scholar 

  73. Hu, Y., LeBlanc, Z., Diesner, J., Underwood, T., Layne-Worthey, G., Downie, J.S.: Complexities associated with user-generated book reviews in digital libraries: temporal, cultural, and political case studies. In: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries, pp. 1–12 (2022)

    Google Scholar 

  74. Hudson, J.M., Bruckman, A.: “Go away”: participant objections to being studied and the ethics of chatroom research. Inf. Soc. 20(2), 127–139 (2004)

    Google Scholar 

  75. Hui, N.: Content-specific ranking prediction for online reviews-case of douban book reviews. Manag. Rev. 33(2), 176 (2021)

    Google Scholar 

  76. Hutton, L., Henderson, T.: Making social media research reproducible. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 9, pp. 2–7 (2015)

    Google Scholar 

  77. International Federation of Library Associations and Institutions: IFLA code of ethics for librarians and other information workers (full version) (2012). https://www.ifla.org/publications/ifla-code-of-ethics-for-librarians-and-other-information-workers-full-version/

  78. International Federation of Library Associations and Institutions: IFLA statement on privacy in the library environment (2015). https://www.ifla.org/publications/ifla-statement-on-privacy-in-the-library-environment/

  79. Jett, J., Cole, T., Maden, C., Downie, J.: The hathitrust research center workset ontology: a descriptive framework for non-consumptive research collections. J. Open Humanit. Data 2 (2016)

    Google Scholar 

  80. Jiang, M., Diesner, J.: Issue-focused documentaries versus other films: rating and type prediction based on user-authored reviews. In: Proceedings of the 27th ACM Conference on Hypertext and Social Media, pp. 225–230 (2016)

    Google Scholar 

  81. Jiang, M., Diesner, J.: Says who\(\ldots \)? Identification of expert versus layman critics’ reviews of documentary films. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2122–2132 (2016)

    Google Scholar 

  82. Kaminski, M.: A recent renaissance in privacy law. Commun. ACM 63(9), 24–27 (2020)

    Google Scholar 

  83. Kayla: Book chat: Authors being negative towards reviewers (2017). https://gracelingaccountantblog.wordpress.com/2017/12/06/book-chat-authors-being-negative-towards-reviewers/

  84. Klinefelter, A.: Reader privacy in digital library collaborations: signs of commitment, opportunities for improvement. ISJLP 13, 199 (2016)

    Google Scholar 

  85. Koolen, M., Neugarten, J., Boot, P.: ‘This book makes me happy and sad and i love it’. a rule-based model for extracting reading impact from English book reviews. J. Comput. Literary Stud. 1(1) (2022)

    Google Scholar 

  86. Koolena, M., Bootb, P., van Zundertb, J.J.: Online book reviews and the computational modelling of reading impact. In: Proceedings of Workshop on Computational Humanities Research (CHR), vol. 1613, p. 0073 (2020)

    Google Scholar 

  87. Kosinski, M., Matz, S.C., Gosling, S.D., Popov, V., Stillwell, D.: Facebook as a research tool for the social sciences: opportunities, challenges, ethical considerations, and practical guidelines. Am. Psychol. 70(6), 543 (2015)

    Google Scholar 

  88. Kuijpers, M.M.: Bodily involvement in readers’ online book reviews: applying text world theory to examine absorption in unprompted reader response. J. Lit. Semant. 51(2), 111–129 (2022)

    Google Scholar 

  89. Kutzner, K., Petzold, K., Knackstedt, R.: Characterising social reading platforms-a taxonomy-based approach to structure the field. In: Proceedings of the 14th International Conference on Wirtschaftsinformatik (2019)

    Google Scholar 

  90. Lambert, A.D., Parker, M., Bashir, M.: Library patron privacy in jeopardy an analysis of the privacy policies of digital content vendors. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–9 (2015)

    Google Scholar 

  91. Lamdan, S.S.: Why library cards offer more privacy rights than proof of citizenship: librarian ethics and freedom of information act requestor policies. Gov. Inf. Q. 30(2), 131–140 (2013)

    Google Scholar 

  92. Lanjinger: One-star reviewing bombing started from the truce (the diary of martín santomé) (orginally in Chinese) (2021). https://k.sina.com.cn/article_5617041192_14ecd3f280200135ul.html

  93. Lavin, M.J., et al.: Cultural analytics and the book review: models, methods, and corpora. In: DH2020(ADHO) Proceedings (2020). https://dh2020.adho.org/wp-content/uploads/2020/07/516_CulturalAnalyticsandtheBookReviewModelsMethodsandCorpora.html

  94. LibraryThing: Privacy policy, community rules, and terms of service (2020). https://www.librarything.com/privacy

  95. LibraryThing: About librarything (2022). https://www.librarything.com/about

  96. Lin, E., Fang, S., Wang, J.: Mining online book reviews for sentimental clustering. In: 2013 27th International Conference on Advanced Information Networking and Applications Workshops, pp. 179–184. IEEE (2013)

    Google Scholar 

  97. Lu, C., Park, J.R., Hu, X.: User tags versus expert-assigned subject terms: a comparison of librarything tags and library of congress subject headings. J. Inf. Sci. 36(6), 763–779 (2010)

    Google Scholar 

  98. Lunnay, B., Borlagdan, J., McNaughton, D., Ward, P.: Ethical use of social media to facilitate qualitative research. Qual. Health Res. 25(1), 99–109 (2015)

    Google Scholar 

  99. Maity, S.K., Panigrahi, A., Mukherjee, A.: Book reading behavior on goodreads can predict the amazon best sellers. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 451–454 (2017)

    Google Scholar 

  100. Mannheimer, S., Pienta, A., Kirilova, D., Elman, C., Wutich, A.: Qualitative data sharing: data repositories and academic libraries as key partners in addressing challenges. Am. Behav. Sci. 63(5), 643–664 (2019)

    Google Scholar 

  101. Mannheimer, S., Young, S.W., Rossmann, D.: On the ethics of social network research in libraries. J. Inf. Commun. Ethics Soc. (2016)

    Google Scholar 

  102. Martens, M., Balling, G., Higgason, K.A.: # booktokmademereadit: young adult reading communities across an international, sociotechnical landscape. Inf. Learn. Sci. (ahead-of-print) (2022)

    Google Scholar 

  103. McAuley, J., Targett, C., Shi, Q., Van Den Hengel, A.: Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–52 (2015)

    Google Scholar 

  104. McCluskey, M.: Goodreads’ problem with extortion scams and review bombing (2021). https://time.com/6078993/goodreads-review-bombing/

  105. McDonald, A.M., Cranor, L.F.: The cost of reading privacy policies. ISJLP 4, 543 (2008)

    Google Scholar 

  106. Mengting, W.: UCSD book graph: Goodreads datasets (2019). https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home

  107. Metcalf, J., Crawford, K.: Where are human subjects in big data research? The emerging ethics divide. Big Data Soc. 3(1), 2053951716650211 (2016)

    Google Scholar 

  108. Milligan, I.: The problem of history in the age of abundance (2016). http://hdl.handle.net/10012/11817

  109. Mishra, S., Saini, A., Makki, R., Mehta, S., Haghighi, A., Mollahosseini, A.: Tweetnerd-end to end entity linking benchmark for tweets. arXiv preprint arXiv:2210.08129 (2022)

  110. Nakamura, L.: “Words with friends”: socially networked reading on goodreads. PMLA/Publ. Mod. Lang. Assoc. Am. 128(1), 238–243 (2013)

    Google Scholar 

  111. Nan, X., Li, M., Shi, J.: Using altmetrics for assessing impact of highly-cited books in Chinese book citation index. Scientometrics 122(3), 1651–1669 (2020)

    Google Scholar 

  112. Oltmann, S.M.: Intellectual freedom and freedom of speech: three theoretical perspectives. Libr. Q. 86(2), 153–171 (2016)

    Google Scholar 

  113. Organisciak, P., Downie, J.S.: Research access to in-copyright texts in the humanities. In: Information and Knowledge Organisation in Digital Humanities, pp. 157–177. Routledge (2021)

    Google Scholar 

  114. Pianzola, F., Rebora, S., Lauer, G.: Wattpad as a resource for literary studies. quantitative and qualitative examples of the importance of digital social reading and readers’ comments in the margins. PLoS ONE 15(1), e0226708 (2020)

    Google Scholar 

  115. Pianzola, F., et al.: Books’ impact in digital social reading: towards a conceptual and methodological framework. In: Digital Humanities 2022 Conference Abstracts, pp. 94–98 (2022). https://dh2022.dhii.asia/dh2022bookofabsts.pdf

  116. Pinch, T.: Book reviewing for amazon.com: how socio-technical systems struggle to make less from more. In: Managing Overflow in Affluent Societies, pp. 80–99. Routledge (2012)

    Google Scholar 

  117. Reads with Rachel: Author attacks book reviewer |Richard Brittain | authors behaving badly (2022). https://www.youtube.com/watch?v=4Z5iIP8c5qs

  118. Rebora, S., et al.: Digital humanities and digital social reading. Digit. Scholarsh. Humanit. 36(Supplement_2), ii230–ii250 (2021)

    Google Scholar 

  119. Rebora, S., Messerli, T., Herrmann, J.B.: Towards a computational study of German book reviews. A comparison between emotion dictionaries and transfer learning in sentiment analysis. 8. Jahrestagung «Digital Humanities im deutschsprachigen Raum»(DhD), Potsdam, D. (2022)

    Google Scholar 

  120. Rebora, S., Pianzola, F.: A new research programme for reading research: analysing comments in the margins on wattpad. DigitCult-Sci. J. Digit. Cult. 3(2), 19–36 (2018)

    Google Scholar 

  121. Rezapour, R., Diesner, J.: Classification and detection of micro-level impact of issue-focused documentary films based on reviews. In: Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, pp. 1419–1431 (2017)

    Google Scholar 

  122. Sabri, N., Weber, I.: A global book reading dataset. Data 6(8), 83 (2021)

    Google Scholar 

  123. Samberg, R.G., Hennesy, C.: Law and literacy in non-consumptive text mining: guiding researchers through the landscape of computational text analysis (2019)

    Google Scholar 

  124. Sen, S., Lerman, D.: Why are you telling me this? an examination into negative consumer reviews on the web. J. Interact. Mark. 21(4), 76–94 (2007)

    Google Scholar 

  125. Shahsavari, S., et al.: An automated pipeline for character and relationship extraction from readers literary book reviews on goodreads.com. In: 12th ACM Conference on Web Science, pp. 277–286 (2020)

    Google Scholar 

  126. Sharma, R.: Black and LGBTQ+ authors say they’re being harassed on goodreads and trolled with one-star book reviews (2021). https://inews.co.uk/culture/books/goodreadsbookreviewsblacklgbtq-authorsharrassedtrolled949179

  127. Sharmaa, A., Hu, Y., Wu, P., Shang, W., Singhal, S., Underwood, T.: The rise and fall of genre differentiation in English-language fiction. In: DH2020 (ADHO) Proceedings, vol. 1613, p. 0073 (2020)

    Google Scholar 

  128. Sheila (Book Journey): When authors attack\(\ldots \) (2011). https://bookjourney.net/2011/12/04/when-authors-attack/

  129. Shen, X., Zhang, K.Z., Zhao, S.J.: Understanding information adoption in online review communities: the role of herd factors. In: 2014 47th Hawaii International Conference on System Sciences, pp. 604–613. IEEE (2014)

    Google Scholar 

  130. Shenglan, T., Haiqing, H., JIANG, L., Xu, Z., SELMAN, R.L.: Chinese and English reviews of a story about teenagers’ struggles: a multi-method analysis of cultural differences in narrative interpretation. Beijing Int. Rev. Educ. 2(3), 365–387 (2020)

    Google Scholar 

  131. Sourati Hassan Zadeh, Z., Sabri, N., Chamani, H., Bahrak, B.: Quantitative analysis of fanfictions’ popularity. Soc. Netw. Anal. Mining 12(1), 1–11 (2022)

    Google Scholar 

  132. Srivastava, A.K., Mishra, R.: Analyzing social media research: a data quality and research reproducibility perspective. IIM Kozhikode Soc. Manag. Rev. 12(1), 39–49 (2021)

    Google Scholar 

  133. Supreme Court: Campbell v. acuff-rose music (92-1292), 510 U.S. 569 (1994). https://www.law.cornell.edu/supct/html/92-1292.ZS.html

  134. Szkolar, D.: The USA patriot act: should your library have an official policy? (2013). https://ischool.syr.edu/the-usa-patriot-act-should-your-library-have-an-official-policy/

  135. The European Parliament and the Council of the European Union: Directive (EU) 2019/790 of the European parliament and of the council of 17 April 2019 on copyright and related rights in the digital single market and amending directives 96/9/EC and 2001/29/EC (2019). https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32019L0790 &from=EN

  136. Thelwall, M.: Book genre and author gender: romance \(>\) paranormal-romance to autobiography \(>\) memoir. J. Assoc. Inf. Sci. Technol. 68(5), 1212–1223 (2017)

    Google Scholar 

  137. Thelwall, M.: Reader and author gender and genre in goodreads. J. Librariansh. Inf. Sci. 51(2), 403–430 (2019)

    Google Scholar 

  138. Thelwall, M., Kousha, K.: Goodreads: a social network site for book readers. J. Am. Soc. Inf. Sci. 68(4), 972–983 (2017)

    Google Scholar 

  139. Thomas, M., Caudle, D.M., Schmitz, C.: Trashy tags: problematic tags in librarything. New Library World (2010)

    Google Scholar 

  140. Slee, T.J.: Who is the average goodreads user? You’ll be surprised! (2017). https://www.goodreads.com/author_blog_posts/14538341-who-is-the-average-goodreads-user-you-ll-be-surprised

  141. Tsur, O., Rappoport, A.: Revrank: a fully unsupervised algorithm for selecting the most helpful book reviews. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 3 (2009)

    Google Scholar 

  142. University of Illinois Office for the Protection of Research Subjects: Decision trees (2022). https://oprs.research.illinois.edu/review-processes-checklists/decision-trees

  143. US Copyright Office: Copyright law of the united states (title 17) (2021). https://www.copyright.gov/title17/

  144. U.S. Food and Drug Administration: Institutional review boards frequently asked questions (1998). https://www.fda.gov/regulatory-information/search-fda-guidance-documents/institutional-review-boards-frequently-asked-questions

  145. Vaccaro, K., Karahalios, K., Sandvig, C., Hamilton, K., Langbort, C.: Agree or cancel? Research and terms of service compliance. In: ACM CSCW Ethics Workshop: Ethics for Studying Sociotechnical Systems in a Big Data World (2015)

    Google Scholar 

  146. Verma, P.: The fight between authors and librarians tearing book lovers apart (2022). https://www.washingtonpost.com/technology/2022/07/25/internet-archive-digital-lending-lawsuit/

  147. Vitak, J., Proferes, N., Shilton, K., Ashktorab, Z.: Ethics regulation in social computing research: examining the role of institutional review boards. J. Empir. Res. Hum. Res. Ethics 12(5), 372–382 (2017)

    Google Scholar 

  148. Vitak, J., Shilton, K., Ashktorab, Z.: Beyond the belmont principles: ethical challenges, practices, and beliefs in the online data research community. In: Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, pp. 941–953 (2016)

    Google Scholar 

  149. Voorbij, H.: The value of librarything tags for academic libraries. Online Inf. Rev. 36(2), 196–217 (2012)

    Google Scholar 

  150. Walsh, M., Antoniak, M.: The goodreads ‘classics’: a computational study of readers, amazon, and crowdsourced amateur criticism. J. Cult. Anal. 4, 243–287 (2021)

    Google Scholar 

  151. Wan, M., McAuley, J.J.: Item recommendation on monotonic behavior chains. In: Pera, S., Ekstrand, M.D., Amatriain, X., O’Donovan, J. (eds.) Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, 2–7 October 2018, pp. 86–94. ACM (2018). https://doi.org/10.1145/3240323.3240369

  152. Wan, M., Misra, R., Nakashole, N., McAuley, J.J.: Fine-grained spoiler detection from large-scale review corpora. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2 August 2019, Volume 1: Long Papers, pp. 2605–2610. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1248

  153. Wang, K., Liu, X., Han, Y.: Exploring goodreads reviews for book impact assessment. J. Informet. 13(3), 874–886 (2019)

    Google Scholar 

  154. Wikipedia contributors: Internet archive. Wikipedia (2022). https://en.wikipedia.org/wiki/Internet_Archive

  155. Wikipedia contributors: Personal information protection law of the people’ s republic of china (2021). https://en.wikipedia.org/wiki/Personal_Information_Protection_Law_of_the_People%27s_Republic_of_China

  156. Wikipedia contributors: Amazon books (2022). https://en.wikipedia.org/wiki/Amazon_Books

  157. Wikipedia contributors: Amazon (company) (2022). https://en.wikipedia.org/wiki/Amazon_(company)

  158. Wikipedia contributors: Goodreads (2022). https://en.wikipedia.org/wiki/Goodreads

  159. Wikipedia contributors: Librarything (2022). https://en.wikipedia.org/wiki/LibraryThing

  160. Wikipedia contributors: Wattpad (2022). https://en.wikipedia.org/wiki/Wattpad

  161. World Intellectual Property Organization (WIPO): Wipo copyright treaty (1996). https://wipolex.wipo.int/en/text/295166

  162. Worrall, A.: “like a real friendship”: translation, coherence, and convergence of information values in librarything and goodreads. In: iConference 2015 Proceedings (2015)

    Google Scholar 

  163. Worrall, A.: “connections above and beyond”: information, translation, and community boundaries in librarything and goodreads. J. Assoc. Inf. Sci. Technol. 70(7), 742–753 (2019)

    Google Scholar 

  164. Zhang, C., Tong, T., Bu, Y.: Examining differences among book reviews from various online platforms. Online Inf. Rev. 43(7), 1169–1187 (2019)

    Google Scholar 

  165. Zhou, Q., Zhang, C.: Relationship between scores and tags for Chinese books-in the case of douban book. J. Data Inf. Sci. 6(4), 40 (2013)

    Google Scholar 

  166. Zhou, Q., Zhang, C., Zhao, S.X., Chen, B.: Measuring book impact based on the multi-granularity online review mining. Scientometrics 107(3), 1435–1455 (2016). https://doi.org/10.1007/s11192-016-1930-5

    Article  Google Scholar 

  167. Zimmer, M.: Addressing conceptual gaps in big data research ethics: an application of contextual integrity. Soc. Media+ Soc. 4(2), 2056305118768300 (2018)

    Google Scholar 

  168. Zimmer, M.: “But the data is already public”: on the ethics of research in Facebook. In: The Ethics of Information Technologies, pp. 229–241. Routledge (2020)

    Google Scholar 

  169. Zuccala, A.A., Verleysen, F.T., Cornacchia, R., Engels, T.C.: Altmetrics for the humanities: comparing goodreads reader ratings with citations to history books. Aslib J. Inf. Manag. (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuerong Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, Y., Layne-Worthey, G., Martaus, A., Downie, J.S., Diesner, J. (2023). Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized Mitigations. In: Sserwanga, I., et al. Information for a Better World: Normality, Virtuality, Physicality, Inclusivity. iConference 2023. Lecture Notes in Computer Science, vol 13971. Springer, Cham. https://doi.org/10.1007/978-3-031-28035-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-28035-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-28034-4

  • Online ISBN: 978-3-031-28035-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics