research-article

Search with Discretion: Value Sensitive Design of Training Data for Information Retrieval

Authors:

Modassir Iqbal,

Mahmoud F. Sayed,

Jonah Lynn Rivera,

William CoxAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 5, Issue CSCW1

Article No.: 133, Pages 1 - 20

https://doi.org/10.1145/3449207

Published: 22 April 2021 Publication History

Abstract

This paper describes and assesses the value sensitive design (VSD) of a test collection: data used to train and evaluate a machine learning system for information retrieval. The project used the VSD framework and methods to design a test collection annotated for discretion. We conducted qualitative stakeholder interviews to develop values personas, which guided annotation of a collection of corporate emails for contextual notions of sensitivity. Both qualitative and quantitative evaluations of the method reveal that the values personas concretely shaped annotators' sensitivity judgments, and analysis of the test collection itself demonstrates that the sensitivity annotations have utility for identifying features that may correlate with email sensitivity. Values personas for training data annotation expand the toolkit of methods for value-sensitive machine learning.

References

[1]

Tamara Adlin and John Pruitt. 2010. The essential persona lifecycle: your guide to building and using personas. Morgan Kaufmann, Amsterdam; Boston. Retrieved May 26, 2020 from http://www.books24x7.com/marc.asp?bookid=37227

[2]

M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Trans. Assoc. Comput. Linguist.6, (2018), 587--604.

[3]

George E. P. Box. 1976. Science and Statistics. J. Am. Stat. Assoc.71, 356 (1976), 791--799.

[4]

Bodong Chen and Haiyi Zhu. 2019. Towards Value-Sensitive Learning Analytics Design. Proceedings of the 9th International Conference on Learning Analytics & Knowledge, 343--352. Retrieved from https://doi.org/10.1145/3303772.3303798

Digital Library

[5]

Andrew Clement. 1990. Cooperative support for computer work: a social perspective on the empowering of end users. In Proceedings of the 1990 ACM conference on Computer-supported cooperative work - CSCW '90, ACM Press, Los Angeles, California, United States, 223--236.

Digital Library

[6]

Council on Library and Information Resources. 2018. The Future of Email Archives A Report from the Task Force on Technical Approaches for Email Archives. Council on Library and Information Resources. Retrieved from https://clir.wordpress.clir.org/wp-content/uploads/sites/6/2018/08/CLIR-pub175.pdf

[7]

Janet Davis and Lisa P. Nathan. 2015. Value Sensitive Design: Applications, Adaptations, and Critiques. In Handbook of Ethics, Values, and Technological Design: Sources, Theory, Values and Application Domains, Jeroen van den Hoven, Pieter E. Vermaas and Ibo van de Poel (eds.). Springer Netherlands, Dordrecht, 11--40. 94-007-6970-0_3

[8]

Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P. Hamilton, and Derek Roth. 2019. A Comparative Study of Fairness-Enhancing Interventions in Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19), Association for Computing Machinery, New York, NY, USA, 329--338.

Digital Library

[9]

Batya Friedman, Alan Borning, Janet L. Davis, Brian T. Gill, Peter H. Kahn, Travis Kriplean, and Peyina Lin. 2008. Laying the Foundations for Public Participation and Value Advocacy: Interaction Design for a Large Scale Urban Simulation. In Proceedings of the 2008 International Conference on Digital Government Research (dg.o '08), Digital Government Society of North America, 305--314.

[10]

Batya Friedman and David Hendry. 2012. The envisioning cards: a toolkit for catalyzing humanistic and technical imaginations. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12, ACM Press, Austin, Texas, USA, 1145.

Digital Library

[11]

Batya Friedman, David G. Hendry, and Alan Borning. 2017. A Survey of Value Sensitive Design Methods. Found. Trends® Human--Computer Interact.11, 2 (2017), 63--125.

Digital Library

[12]

Batya Friedman, Peter Kahn, Alan Borning, Ping Zhang, and Dennis Galletta. 2006. Value Sensitive Design and Information Systems. In The Handbook of Information and Computer Ethics.

[13]

Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudík, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need? Proc. 2019 CHI Conf. Hum. Factors Comput. Syst. - CHI 19 (2019), 1--16.

Digital Library

[14]

Meir Hornug. 2005. Think Before You Type: A Look at Email Privacy in the Work Place. Fordham J. Corp. Financ. Law 11, 1 (January 2005), 115.

[15]

J Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 33, 1 (March 1977), 159.

[16]

Justin Larner. Value-led Personas: A Methodology to Promote Sustainable User-centered Design? Retrieved from http://nordichi2014.appaholiclabs.com/details_paper.php?id=6867104

[17]

Kirsten Martin and Helen Nissenbaum. 2016. Measuring Privacy: An Empirical Test Using Context to Expose Confounding Variables. Columbia Sci. Technol. Law Rev.18, 1 (2017 2016), 176--218.

[18]

Kirsten Martin and Katie Shilton. 2016. Putting mobile application privacy in context: An empirical study of user privacy expectations for mobile devices. Inf. Soc.32, 3 (May 2016), 200--216.

Digital Library

[19]

Michael Muller, Shion Guha, Eric P.S. Baumer, David Mimno, and N. Sadat Shami. 2016. Machine Learning and Grounded Theory Method: Convergence, Divergence, and Combination. In Proceedings of the 19th International Conference on Supporting Group Work (GROUP '16), Association for Computing Machinery, New York, NY, USA, 3--8.

Digital Library

[20]

Deirdre K. Mulligan, Colin Koopman, and Nick Doty. 2016. Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy. Philos. Trans. R. Soc. Math. Phys. Eng. Sci.374, 2083 (December 2016), 20160118.

[21]

Alison R. Murphy, Madhu C. Reddy, and Heng Xu. 2014. Privacy practices in collaborative environments: a study of emergency department staff. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing (CSCW '14), Association for Computing Machinery, Baltimore, Maryland, USA, 269--282.

Digital Library

[22]

Helen Nissenbaum. 2009. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press.

Digital Library

[23]

Douglas W. Oard, William Webber, David Kirsch, and Sergey Golitsynskiy. 2015. Avocado Research Email Collection - Linguistic Data Consortium. Retrieved October 15, 2020 from https://catalog.ldc.upenn.edu/LDC2015T03

[24]

Mahmoud F. Sayed and Douglas W. Oard. 2019. Jointly Modeling Relevance and Sensitivity for Search Among Sensitive Content. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France.

[25]

Steven Umbrello. 2019. Beneficial Artificial Intelligence Coordination by means of a Value Sensitive Design Approach. Big Data Cogn. Comput.3, 1 (2019), 5.

[26]

Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design.

Digital Library

[27]

Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-Sensitive Algorithm Design: Method, Case Study, and Lessons. Proc. ACM Hum.-Comput. Interact.2, CSCW (November 2018), 1--23.

Digital Library

[28]

Mahmoud F. Sayed, William Cox, Jonah Lynn Rivera, Caitlin Christian-Lamb, Modassir Iqbal, Douglas W. Oard, and Katie Shilton, 2020. A Test Collection for Relevance and Sensitivity. Proceedings of the 43 rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, 1605--1608.

Digital Library

Cited By

Poulsen ABurmeister OGreig JUlhaq ATien D(2025)Value Sensitive Design of Social Robots: Enhancing the Lives of LGBT+ Older AdultsInternational Journal of Social Robotics10.1007/s12369-024-01201-617:1(147-162)Online publication date: 11-Jan-2025
https://doi.org/10.1007/s12369-024-01201-6
Delgado FYang SMadaio MYang Q(2023)The Participatory Turn in AI Design: Theoretical Foundations and the Current State of PracticeProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623261(1-23)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3617694.3623261
Branting KBrown BGiannella CGuilder JHarrold JHowell SBaron J(2023)Decision support for detecting sensitive text in government recordsArtificial Intelligence and Law10.1007/s10506-023-09383-633:1(171-197)Online publication date: 10-Dec-2023
https://doi.org/10.1007/s10506-023-09383-6
Show More Cited By

Index Terms

Search with Discretion: Value Sensitive Design of Training Data for Information Retrieval
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
2. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

Abbreviations and acronyms are widely used in the biomedical literature and many of them represent important biomedical concepts. Because many abbreviations are ambiguous (e.g., CAT denotes both chloramphenicol acetyl transferase and computed axial ...
Using MEDLINE as a Knowledge Source for Disambiguating Abbreviations in Full-Text Biomedical Journal Articles
CBMS '04: Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems

Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many abbreviations represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of biomedical ...
User-centered design, activity-centered design, and goal-directed design: a review of three methods for designing web applications
SIGDOC '09: Proceedings of the 27th ACM international conference on Design of communication

When conducting research with users in order to design web applications, the practitioner has a variety of methods from which to choose. This paper examines three such methods'User-Centered Design (UCD), Goal-Directed Design (GDD), and Activity-Centered ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 5, Issue CSCW1

CSCW

April 2021

5016 pages

EISSN:2573-0142

DOI:10.1145/3460939

Editor:
Jeff Nichols
Apple Inc., United States

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2021

Published in PACMHCI Volume 5, Issue CSCW1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
337
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Poulsen ABurmeister OGreig JUlhaq ATien D(2025)Value Sensitive Design of Social Robots: Enhancing the Lives of LGBT+ Older AdultsInternational Journal of Social Robotics10.1007/s12369-024-01201-617:1(147-162)Online publication date: 11-Jan-2025
https://doi.org/10.1007/s12369-024-01201-6
Delgado FYang SMadaio MYang Q(2023)The Participatory Turn in AI Design: Theoretical Foundations and the Current State of PracticeProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3617694.3623261(1-23)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3617694.3623261
Branting KBrown BGiannella CGuilder JHarrold JHowell SBaron J(2023)Decision support for detecting sensitive text in government recordsArtificial Intelligence and Law10.1007/s10506-023-09383-633:1(171-197)Online publication date: 10-Dec-2023
https://doi.org/10.1007/s10506-023-09383-6
Ky JMathieu BLahmadi ABoutaba R(2022)Assessing Unsupervised Machine Learning Solutions for Anomaly Detection in Cloud Gaming SessionsProceedings of the 18th International Conference on Network and Service Management10.5555/3581644.3581714(1-7)Online publication date: 31-Oct-2022
https://dl.acm.org/doi/10.5555/3581644.3581714
Sayed MMallekav NOard D(2022)Comparing Intrinsic and Extrinsic Evaluation of Sensitivity ClassificationAdvances in Information Retrieval10.1007/978-3-030-99739-7_25(215-222)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99739-7_25

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents