skip to main content
research-article
Public Access

A Tale of Two Communities: Privacy of Third Party App Users in Crowdsourcing - The Case of Receipt Transcription

Published: 04 October 2023 Publication History

Abstract

Mobile and web apps are increasingly relying on the data generated or provided by users such as from their uploaded documents and images. Unfortunately, those apps may raise significant user privacy concerns. Specifically, to train or adapt their models for accurately processing huge amounts of data continuously collected from millions of app users, app or service providers have widely adopted the approach of crowdsourcing for recruiting crowd workers to manually annotate or transcribe the sampled ever-changing user data. However, when users' data are uploaded through apps and then become widely accessible to hundreds of thousands of anonymous crowd workers, many human-in-the-loop related privacy questions arise concerning both the app user community and the crowd worker community. In this paper, we propose to investigate the privacy risks brought by this significant trend of large-scale crowd-powered processing of app users' data generated in their daily activities. We consider the representative case of receipt scanning apps that have millions of users, and focus on the corresponding receipt transcription tasks that appear popularly on crowdsourcing platforms. We design and conduct an app user survey study (n=108) to explore how app users perceive privacy in the context of using receipt scanning apps. We also design and conduct a crowd worker survey study (n=102) to explore crowd workers' experiences on receipt and other types of transcription tasks as well as their attitudes towards such tasks. Overall, we found that most app users and crowd workers expressed strong concerns about the potential privacy risks to receipt owners, and they also had a very high level of agreement with the need for protecting receipt owners' privacy. Our work provides insights on app users' potential privacy risks in crowdsourcing, and highlights the need and challenges for protecting third party users' privacy on crowdsourcing platforms. We have responsibly disclosed our findings to the related crowdsourcing platform and app providers.

References

[1]
Acceptable Use Policy of MTurks 2018. Acceptable Use Policy of MTurk. https://www.mturk.com/acceptable-use-policy.
[2]
Yuvraj Agarwal and Malcolm Hall. 2013. ProtectMyPrivacy: detecting and mitigating privacy leaks on iOS devices using crowdsourcing. In Proceedings of the International Conference on Mobile systems, Applications, and Services (MobiSys).
[3]
Taslima Akter, Bryan Dosono, Tousif Ahmed, Apu Kapadia, and Bryan Semaan. 2020. "I am uncomfortable sharing what I can't see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications. In Proceedings of the USENIX Security Symposium (USENIX Security).
[4]
Maximilian Altmeyer, Pascal Lessel, and Antonio Krüger. 2016. Expense control: A gamified, semi-automated, crowd-based approach for receipt capturing. In Proceedings of the International Conference on Intelligent User Interfaces (IUI).
[5]
Brandon Amos, Bartosz Ludwiczuk, Mahadev Satyanarayanan, et al. 2016. Openface: A general-purpose face recognition library with mobile applications. Carnegie Mellon University School of Computer Science Technical Report (2016).
[6]
Bankwest-Scammers 2022. Bankwest text warning: ?Scammers have last 4 digits of your card number'. https://au.finance.yahoo.com/news/banks-text-warning-scammers-have-last-4-digits-of-your-card-number-234255113.html.
[7]
Virginia Braun and Victoria Clarke. 2006. Using Thematic Analysis in Psychology. Qualitative Research in Psychology 3, 2 (2006), 77--101.
[8]
CCPA 2018. G. DATA BROKERS AND THE CCPA. https://oag.ca.gov/privacy/ccpa.
[9]
Yan Chen, Andrés Monroy-Hernández, Ian Wehrman, Steve Oney, Walter S Lasecki, and Rajan Vaish. 2020. Sifter: A Hybrid Workflow for Theme-based Video Curation at Scale. In ACM International Conference on Interactive Media Experiences.
[10]
Justin Cranshaw, Emad Elwany, Todd Newman, Rafal Kocielnik, Bowen Yu, Sandeep Soni, Jaime Teevan, and Andrés Monroy-Hernández. 2017. Calendar.Help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI).
[11]
Lee J Cronbach. 1951. Coefficient alpha and the internal structure of tests. psychometrika (1951).
[12]
Anupam Das, Nikita Borisov, and Matthew Caesar. 2014. Do You Hear What I Hear?: Fingerprinting Smart Devices Through Embedded Acoustic Components. In Proceedings of the ACM Conference on Computer and Communications Security (CCS).
[13]
Anupam Das, Nikita Borisov, and Matthew Caesar. 2016. Tracking Mobile Web Users Through Motion Sensors: Attacks and Defenses. In Proceedings of the Network and Distributed System Security Symposium (NDSS).
[14]
Sanorita Dey, Nirupam Roy, Wenyuan Xu, Romit Roy Choudhury, and Srihari Nelakuditi. 2014. AccelPrint: Imperfections of Accelerometers Make Smartphones Trackable. In Proceedings of the Network and Distributed System Security Symposium (NDSS).
[15]
E-Receipts 2018. E-receipts from leading retailers 'may break data protection rules'. https://www.theguardian.com/business/2018/dec/11/e-receipts-from-major-retailers-may-break-data-protection-rules-which.
[16]
Peter Eckersley. 2010. How Unique is Your Web Browser?. In Proceedings of the Privacy Enhancing Technologies Symposium (PETS).
[17]
Daniel Esser, Klemens Muthmann, and Daniel Schuster. 2013. Information extraction efficiency of business documents captured with smartphones and tablets. In Proceedings of the ACM Symposium on Document Engineering (DocEng).
[18]
ExifViewer 2022. Exif Viewer Firefox Browser Extension. https://addons.mozilla.org/en-US/firefox/addon/exif-viewer/.
[19]
Expensify-MTurk-Case 2017. It's Not Always AI That Sifts Through Your Sensitive Info. https://www.wired.com/story/not-always-ai-that-sifts-through-sensitive-info-crowdsourced-labor/.
[20]
Adrienne Porter Felt, Serge Egelman, and David Wagner. 2012. I've got 99 problems, but vibration ain't one: a survey of smartphone users' concerns. In Proceedings of the ACM workshop on Security and Privacy in Smartphones and Mobile Devices.
[21]
Adrienne Porter Felt, Elizabeth Ha, Serge Egelman, Ariel Haney, Erika Chin, and David Wagner. 2012. Android permissions: User attention, comprehension, and behavior. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS).
[22]
Casey Fiesler and Blake Hallinan. 2018. ?We Are the Product" Public Reactions to Online Data Sharing and Privacy Controversies in the Media. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI).
[23]
Sandra Gabriele and Sonia Chiasson. 2020. Understanding fitness tracker users' security and privacy knowledge, attitudes and behaviours. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI).
[24]
GDPR 2018. Personal Data - General Data Protection Regulation (GDPR). https://gdpr-info.eu/issues/personal-data/.
[25]
Nina Gerber, Paul Gerber, and Melanie Volkamer. 2018. Explaining the privacy paradox: A systematic review of literature investigating privacy attitude and behavior. Computers & security (2018).
[26]
Google Assistant Service 2019. Google ordered to halt human review of voice AI recordings over privacy risks. https://techcrunch.com/2019/08/02/google-ordered-to-halt-human-review-of-voice-ai-recordings-over-privacy-risks/.
[27]
Danna Gurari, Qing Li, Chi Lin, Yinan Zhao, Anhong Guo, Abigale Stangl, and Jeffrey P Bigham. 2019. Vizwiz-priv: A dataset for recognizing the presence and purpose of private visual information in images taken by blind people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28]
Alexander Heinrich, Matthias Hollick, Thomas Schneider, Milan Stute, and Christian Weinert. 2021. PrivateDrop: Practical Privacy-Preserving Authentication for Apple AirDrop. In Proceedings of the USENIX Security Symposium.
[29]
Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shijian Lu, and CV Jawahar. 2019. Icdar2019 competition on scanned receipt ocr and information extraction. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR).
[30]
Thomas Hupperich, Davide Maiorca, Marc Kührer, Thorsten Holz, and Giorgio Giacinto. 2015. On the Robustness of Mobile Device Fingerprinting: Can Mobile Users Escape Modern Web-Tracking Mechanisms?. In Proceedings of the Computer Security Applications Conference (ACSAC).
[31]
Farnaz Jahanbakhsh, Elnaz Nouri, Robert Sim, Ryen W. White, and Adam Fourney. 2022. Understanding Questions that Arise When Working with Business Documents. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW).
[32]
Nan Jiang, Yi Zhuang, and Dickson KW Chiu. 2020. Effective and efficient crowd-assisted similarity retrieval of medical images in resource-constraint Mobile telemedicine systems. Multimedia Tools and Applications (2020).
[33]
Thivya Kandappu, Vijay Sivaraman, Arik Friedman, and Roksana Boreli. 2014. Loki: a privacy-conscious platform for crowdsourced surveys. In Proceedings of the International Conference on Communication Systems and Networks (COMSNETS).
[34]
Harmanpreet Kaur, Mitchell L. Gordon, Yiwei Yang, Jeffrey P. Bigham, Jaime Teevan, Ece Kamar, and Walter S. Lasecki. 2017. CrowdMask: Using Crowds to Preserve Privacy in Crowd-Powered Systems via Progressive Filtering. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (HCOMP).
[35]
Aniket Kittur, Jeffrey V Nickerson, Michael Bernstein, Elizabeth Gerber, Aaron Shaw, John Zimmerman, Matt Lease, and John Horton. 2013. The future of crowd work. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW).
[36]
Spyros Kokolakis. 2017. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon. Computers & security (2017).
[37]
Klaus Krippendorff. 2018. Content Analysis: An Introduction to Its Methodology. SAGE Publications; 4th Edition.
[38]
Priya Kumar, Shalmali Milind Naik, Utkarsha Ramesh Devkar, Marshini Chetty, Tamara L. Clegg, and Jessica Vitak. 2017. "No Telling Passcodes Out Because They're Private': Understanding Children's Mental Models of Privacy and Security Online. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 64 (2017).
[39]
Walter S. Lasecki, Jaime Teevan, and Ece Kamar. 2014. Information Extraction and Manipulation Threats in Crowd-Powered Systems. In Proceedings of the ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW).
[40]
Matthew Lease, Jessica Hullman, Jeffrey Bigham, Michael Bernstein, Juho Kim, Walter Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert Miller. 2013. Mechanical turk is not anonymous. Social Science Research Network (2013).
[41]
Hwansoo Lee, Dongwon Lim, Hyerin Kim, Hangjung Zo, and Andrew P Ciganek. 2015. Compensation paradox: the influence of monetary rewards on user behaviour. Behaviour & Information Technology (2015).
[42]
Christophe Leung, Jingjing Ren, David Choffnes, and Christo Wilson. 2016. Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services. In Proceedings of the Internet Measurement Conference (IMC).
[43]
Yifang Li, Nishant Vishwamitra, Bart P Knijnenburg, Hongxin Hu, and Kelly Caine. 2017. Effectiveness and users' experience of obfuscation as a privacy-enhancing technology for sharing photos. Proceedings of the ACM on Human-Computer Interaction (CSCW) (2017).
[44]
Jialiu Lin, Shahriyar Amini, Jason I Hong, Norman Sadeh, Janne Lindqvist, and Joy Zhang. 2012. Expectation and purpose: understanding users' mental models of mobile app privacy through crowdsourcing. In Proceedings of the ACM Conference on Ubiquitous Computing (UbiComp).
[45]
Bin Liu, Mads Schaarup Andersen, Florian Schaub, Hazim Almuhimedi, Shikun Aerin Zhang, Norman Sadeh, Yuvraj Agarwal, and Alessandro Acquisti. 2016. Follow my recommendations: A personalized privacy assistant for mobile app permissions. In Proceedings of the Symposium on Usable Privacy and Security (SOUPS).
[46]
Raymond Madden. 2017. Being Ethnographic: A Guide to the Theory and Practice of Ethnography. SAGE Publications; 2nd Edition.
[47]
Gary Marchionini. 2006. Exploratory Search: From Finding to Understanding. Commun. ACM 49, 4 (2006), 41--46.
[48]
Tamir Mendel and Eran Toch. 2017. Susceptibility to social influence of privacy behaviors: Peer versus authoritative sources. In Proceedings of the ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW).
[49]
MTurk-Alternatives 2022. 16 Best Amazon MTurk Alternatives To Make More Money Online. https://thisonlineworld.com/mturk-alternatives/.
[50]
Amy Pavel, Colorado Reed, Björn Hartmann, and Maneesh Agrawala. 2014. Video digests: a browsable, skimmable format for informational lecture videos. In Proceedings of the ACM Symposium on User Interface Software and Technology.
[51]
PCAPdroid 2022. PCAPdroid User Guid. https://emanuele-f.github.io/PCAPdroid/.
[52]
Weiping Pei, Arthur Mayer, Kaylynn Tu, and Chuan Yue. 2020. Attention please: Your attention check questions in survey studies can be automatically answered. In Proceedings of The Web Conference.
[53]
Weiping Pei, Zhiju Yang, Monchu Chen, and Chuan Yue. 2021. Quality Control in Crowdsourcing based on Fine-Grained Behavioral Features. Proceedings of the ACM on Human-Computer Interaction (CSCW) (2021).
[54]
Privacy-or-Planet 2021. Column: Privacy or planet - the tough choice of doing away with paper receipts. https://www.latimes.com/business/story/2021-03-02/column-consumers-paper-receipts-environment.
[55]
Jingjing Ren, Ashwin Rao, Martina Lindorfer, Arnaud Legout, and David Choffnes. 2016. Recon: Revealing and controlling pii leaks in mobile network traffic. In Proceedings of the International Conference on Mobile Systems, Applications, and Services (MobiSys).
[56]
Jonathan Robinson, Cheskie Rosenzweig, Aaron J Moss, and Leib Litman. 2019. Tapped out or barely tapped? Recommendations for how to harness the vast and largely unused potential of the Mechanical Turk participant pool. PloS one (2019).
[57]
Sebastian Roth, Lea Gröber, Michael Backes, Katharina Krombholz, and Ben Stock. 2021. 12 Angry Developers - A Qualitative Study on Developers' Struggles with CSP. In Proceedings of the ACM Conference on Computer and Communications Security (CCS). 3085--3103.
[58]
Beatriz Sainz-De-Abajo, José Manuel García-Alonso, José Javier Berrocal-Olmeda, Sergio Laso-Mangas, and Isabel De La Torre-Díez. 2020. FoodScan: Food Monitoring App by Scanning the Groceries Receipts. IEEE Access (2020).
[59]
Shruti Sannon and Dan Cosley. 2018. "It was a shady HIT" Navigating Work-Related Privacy Concerns on MTurk. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA).
[60]
Shruti Sannon and Dan Cosley. 2019. Privacy, power, and invisible labor on Amazon Mechanical Turk. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI).
[61]
Irina Shklovski, Scott D Mainwaring, Halla Hrund Skúladóttir, and Höskuldur Borgthorsson. 2014. Leakiness and Creepiness in App Space: Perceptions of Privacy and Mobile App Use. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
[62]
Anastasia Shuba, Athina Markopoulou, and Zubair Shafiq. 2018. NoMoAds: Effective and Efficient Cross-App Mobile Ad-Blocking. In Proceedings of the Privacy Enhancing Technologies Symposium (PETS).
[63]
Saiganesh Swaminathan, Raymond Fok, Fanglin Chen, Ting-Hao (Kenneth) Huang, Irene Lin, Rohan Jadvani, Walter S. Lasecki, and Jeffrey P. Bigham. 2017. WearMail: On-the-Go Access to Information in Your Email with a Privacy-Preserving Human Computation Workflow. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST). 807--815.
[64]
Aditya Vashistha, Pooja Sethi, and Richard Anderson. 2017. Respeak: A voice-based, crowd-powered speech transcription system. In Proceedings of the CHI Conference on Human Factors in Computing Systems.
[65]
Huichuan Xia, Yang Wang, Yun Huang, and Anuj Shah. 2017. "Our Privacy Needs to Be Protected at All Costs": Crowd Workers' Privacy Experiences on Amazon Mechanical Turk. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 113 (2017).
[66]
Paul Yoder, Frank J. Symons, and Blair P. Lloyd. 2018. Observational Measurement of Behavior. Brookes Publishing; Second Edition.
[67]
Miuyin Yong Wong, Matthew Landen, Manos Antonakakis, Douglas M. Blough, Elissa M. Redmiles, and Mustaque Ahamad. 2021. An Inside Look into the Practice of Malware Analysis. In Proceedings of the ACM Conference on Computer and Communications Security (CCS). 3053--3069.
[68]
Chen Jason Zhang, Ziyuan Zhao, Lei Chen, Hosagrahar Visvesvaraya Jagadish, and Chen Caleb Cao. 2014. Crowd-matcher: crowd-assisted schema matching. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[69]
Daniel Yue Zhang, Yifeng Huang, Yang Zhang, and Dong Wang. 2020. Crowd-assisted disaster scene assessment with human-ai interactive attention. In Proceedings of the AAAI Conference on Artificial Intelligence.
[70]
Guangyu Zhu, Timothy J Bethea, and Vikas Krishna. 2007. Extracting relevant named entities for automated expense reimbursement. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 7, Issue CSCW2
CSCW
October 2023
4055 pages
EISSN:2573-0142
DOI:10.1145/3626953
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2023
Published in PACMHCI Volume 7, Issue CSCW2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. app user
  2. crowdsourcing
  3. privacy
  4. receipt transcription

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 369
    Total Downloads
  • Downloads (Last 12 months)270
  • Downloads (Last 6 weeks)45
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media