skip to main content
research-article

Search with Discretion: Value Sensitive Design of Training Data for Information Retrieval

Published: 22 April 2021 Publication History

Abstract

This paper describes and assesses the value sensitive design (VSD) of a test collection: data used to train and evaluate a machine learning system for information retrieval. The project used the VSD framework and methods to design a test collection annotated for discretion. We conducted qualitative stakeholder interviews to develop values personas, which guided annotation of a collection of corporate emails for contextual notions of sensitivity. Both qualitative and quantitative evaluations of the method reveal that the values personas concretely shaped annotators' sensitivity judgments, and analysis of the test collection itself demonstrates that the sensitivity annotations have utility for identifying features that may correlate with email sensitivity. Values personas for training data annotation expand the toolkit of methods for value-sensitive machine learning.

References

[1]
Tamara Adlin and John Pruitt. 2010. The essential persona lifecycle: your guide to building and using personas. Morgan Kaufmann, Amsterdam; Boston. Retrieved May 26, 2020 from http://www.books24x7.com/marc.asp?bookid=37227
[2]
M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Trans. Assoc. Comput. Linguist.6, (2018), 587--604.
[3]
George E. P. Box. 1976. Science and Statistics. J. Am. Stat. Assoc.71, 356 (1976), 791--799.
[4]
Bodong Chen and Haiyi Zhu. 2019. Towards Value-Sensitive Learning Analytics Design. Proceedings of the 9th International Conference on Learning Analytics & Knowledge, 343--352. Retrieved from https://doi.org/10.1145/3303772.3303798
[5]
Andrew Clement. 1990. Cooperative support for computer work: a social perspective on the empowering of end users. In Proceedings of the 1990 ACM conference on Computer-supported cooperative work - CSCW '90, ACM Press, Los Angeles, California, United States, 223--236.
[6]
Council on Library and Information Resources. 2018. The Future of Email Archives A Report from the Task Force on Technical Approaches for Email Archives. Council on Library and Information Resources. Retrieved from https://clir.wordpress.clir.org/wp-content/uploads/sites/6/2018/08/CLIR-pub175.pdf
[7]
Janet Davis and Lisa P. Nathan. 2015. Value Sensitive Design: Applications, Adaptations, and Critiques. In Handbook of Ethics, Values, and Technological Design: Sources, Theory, Values and Application Domains, Jeroen van den Hoven, Pieter E. Vermaas and Ibo van de Poel (eds.). Springer Netherlands, Dordrecht, 11--40. 94-007-6970-0_3
[8]
Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2019. A Comparative Study of Fairness-Enhancing Interventions in Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19), Association for Computing Machinery, New York, NY, USA, 329--338.
[9]
Batya Friedman, Alan Borning, Janet L. Davis, Brian T. Gill, Peter H. Kahn, Travis Kriplean, and Peyina Lin. 2008. Laying the Foundations for Public Participation and Value Advocacy: Interaction Design for a Large Scale Urban Simulation. In Proceedings of the 2008 International Conference on Digital Government Research (dg.o '08), Digital Government Society of North America, 305--314.
[10]
Batya Friedman and David Hendry. 2012. The envisioning cards: a toolkit for catalyzing humanistic and technical imaginations. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12, ACM Press, Austin, Texas, USA, 1145.
[11]
Batya Friedman, David G. Hendry, and Alan Borning. 2017. A Survey of Value Sensitive Design Methods. Found. Trends® Human--Computer Interact.11, 2 (2017), 63--125.
[12]
Batya Friedman, Peter Kahn, Alan Borning, Ping Zhang, and Dennis Galletta. 2006. Value Sensitive Design and Information Systems. In The Handbook of Information and Computer Ethics.
[13]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudík, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need? Proc. 2019 CHI Conf. Hum. Factors Comput. Syst. - CHI 19 (2019), 1--16.
[14]
Meir Hornug. 2005. Think Before You Type: A Look at Email Privacy in the Work Place. Fordham J. Corp. Financ. Law 11, 1 (January 2005), 115.
[15]
J Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (March 1977), 159.
[16]
Justin Larner. Value-led Personas: A Methodology to Promote Sustainable User-centered Design? Retrieved from http://nordichi2014.appaholiclabs.com/details_paper.php?id=6867104
[17]
Kirsten Martin and Helen Nissenbaum. 2016. Measuring Privacy: An Empirical Test Using Context to Expose Confounding Variables. Columbia Sci. Technol. Law Rev.18, 1 (2017 2016), 176--218.
[18]
Kirsten Martin and Katie Shilton. 2016. Putting mobile application privacy in context: An empirical study of user privacy expectations for mobile devices. Inf. Soc.32, 3 (May 2016), 200--216.
[19]
Michael Muller, Shion Guha, Eric P.S. Baumer, David Mimno, and N. Sadat Shami. 2016. Machine Learning and Grounded Theory Method: Convergence, Divergence, and Combination. In Proceedings of the 19th International Conference on Supporting Group Work (GROUP '16), Association for Computing Machinery, New York, NY, USA, 3--8.
[20]
Deirdre K. Mulligan, Colin Koopman, and Nick Doty. 2016. Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy. Philos. Trans. R. Soc. Math. Phys. Eng. Sci.374, 2083 (December 2016), 20160118.
[21]
Alison R. Murphy, Madhu C. Reddy, and Heng Xu. 2014. Privacy practices in collaborative environments: a study of emergency department staff. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (CSCW '14), Association for Computing Machinery, Baltimore, Maryland, USA, 269--282.
[22]
Helen Nissenbaum. 2009. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press.
[23]
Douglas W. Oard, William Webber, David Kirsch, and Sergey Golitsynskiy. 2015. Avocado Research Email Collection - Linguistic Data Consortium. Retrieved October 15, 2020 from https://catalog.ldc.upenn.edu/LDC2015T03
[24]
Mahmoud F. Sayed and Douglas W. Oard. 2019. Jointly Modeling Relevance and Sensitivity for Search Among Sensitive Content. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France.
[25]
Steven Umbrello. 2019. Beneficial Artificial Intelligence Coordination by means of a Value Sensitive Design Approach. Big Data Cogn. Comput.3, 1 (2019), 5.
[26]
Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design.
[27]
Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-Sensitive Algorithm Design: Method, Case Study, and Lessons. Proc. ACM Hum.-Comput. Interact.2, CSCW (November 2018), 1--23.
[28]
Mahmoud F. Sayed, William Cox, Jonah Lynn Rivera, Caitlin Christian-Lamb, Modassir Iqbal, Douglas W. Oard, and Katie Shilton, 2020. A Test Collection for Relevance and Sensitivity. Proceedings of the 43 rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 1605--1608.

Cited By

View all
  • (2025)Value Sensitive Design of Social Robots: Enhancing the Lives of LGBT+ Older AdultsInternational Journal of Social Robotics10.1007/s12369-024-01201-617:1(147-162)Online publication date: 11-Jan-2025
  • (2023)The Participatory Turn in AI Design: Theoretical Foundations and the Current State of PracticeProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623261(1-23)Online publication date: 30-Oct-2023
  • (2023)Decision support for detecting sensitive text in government recordsArtificial Intelligence and Law10.1007/s10506-023-09383-633:1(171-197)Online publication date: 10-Dec-2023
  • Show More Cited By

Index Terms

  1. Search with Discretion: Value Sensitive Design of Training Data for Information Retrieval
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue CSCW1
      CSCW
      April 2021
      5016 pages
      EISSN:2573-0142
      DOI:10.1145/3460939
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 April 2021
      Published in PACMHCI Volume 5, Issue CSCW1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. email
      2. information retrieval
      3. machine learning
      4. personas
      5. privacy
      6. training data
      7. value-sensitive design

      Qualifiers

      • Research-article

      Funding Sources

      • NSF

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)40
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Value Sensitive Design of Social Robots: Enhancing the Lives of LGBT+ Older AdultsInternational Journal of Social Robotics10.1007/s12369-024-01201-617:1(147-162)Online publication date: 11-Jan-2025
      • (2023)The Participatory Turn in AI Design: Theoretical Foundations and the Current State of PracticeProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623261(1-23)Online publication date: 30-Oct-2023
      • (2023)Decision support for detecting sensitive text in government recordsArtificial Intelligence and Law10.1007/s10506-023-09383-633:1(171-197)Online publication date: 10-Dec-2023
      • (2022)Assessing Unsupervised Machine Learning Solutions for Anomaly Detection in Cloud Gaming SessionsProceedings of the 18th International Conference on Network and Service Management10.5555/3581644.3581714(1-7)Online publication date: 31-Oct-2022
      • (2022)Comparing Intrinsic and Extrinsic Evaluation of Sensitivity ClassificationAdvances in Information Retrieval10.1007/978-3-030-99739-7_25(215-222)Online publication date: 10-Apr-2022

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media