Abstract
Specialized worker profiles of crowdsourcing platforms may contain a large amount of identifying and possibly sensitive personal information (e.g., personal preferences, skills, available slots, available devices) raising strong privacy concerns. This led to the design of privacy-preserving crowdsourcing platforms, that aim at enabling efficient crowdsourcing processes while providing strong privacy guarantees even when the platform is not fully trusted. In this paper, we propose two contributions. First, we propose the PKD algorithm with the goal of supporting a large variety of aggregate usages of worker profiles within a privacy-preserving crowdsourcing platform. The PKD algorithm combines together homomorphic encryption and differential privacy for computing (perturbed) partitions of the multi-dimensional space of skills of the actual population of workers and a (perturbed) COUNT of workers per partition. Second, we propose to benefit from recent progresses in Private Information Retrieval techniques in order to design a solution to task assignment that is both private and affordable. We perform an in-depth study of the problem of using PIR techniques for proposing tasks to workers, show that it is NP-Hard, and come up with the PKD PIR Packing heuristic that groups tasks together according to the partitioning output by the PKD algorithm. In a nutshell, we design the PKD algorithm and the PKD PIR Packing heuristic, we prove formally their security against honest-but-curious workers and/or platform, we analyze their complexities, and we demonstrate their quality and affordability in real-life scenarios through an extensive experimental evaluation performed over both synthetic and realistic datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
We adopt in this paper a broad definition of crowdsourcing, including in particular freelancing platforms (similarly to [29]).
- 6.
- 7.
See for example the Kicklox search form (https://www.kicklox.com/en/) that inputs a list of keywords (typically skills) and displays the corresponding number of workers available.
- 8.
See for example, Kicklox (https://www.kicklox.com/en/) or Tara (https://tara.ai/). The secondary usage consisting in promoting the platform is sometimes performed through a public access to detailed parts of worker profiles (e.g., Malt (https://www.malt.com/), 404works (https://www.404works.com/en/freelancers)).
- 9.
See, e.g.,http://applymagicsauce.com/about-us.
- 10.
For example, internal emails that were leaked from Deliveroo indicate that the geolocation system of Deliveroo was used internally for identifying the riders that participated to strikes against the platform. https://www.lemonde.fr/culture/article/2019/09/24/television-cash-investigation-a-la-rencontre-des-nouveaux-proletaires-du-web_6012758_3246.html.
- 11.
In another example, an Uber executive claimed having tracked a journalist using the company geolocation system. https://tinyurl.com/y4cdvw45.
- 12.
- 13.
- 14.
Note that limiting the information disclosed to the platform (e.g., perturbed information about worker profiles) relieves platforms from the costly task of handling personal data. The European GDPR indeed explicitely excludes anonymized data from its scope (see Article 4, Recital 26 https://gdpr-info.eu/recitals/no-26/).
- 15.
We require that the sum of \(|\mathcal {P}|-\tau \) noise-shares be enough to satisfy differential privacy but we effectively sum \(|\mathcal {P}|\) noise-shares. Note that summing more noise-shares than necessary does not jeopardize privacy guarantees.
- 16.
Note that in the specific case where the median falls within a bin equal to 0 (i.e., \(\widetilde{b_{*,k}}=0\)), then any value within \(\phi _k\) is equivalent.
- 17.
Note that the perturbed histograms could have been used for computing these counts but using a dedicated count has been shown to result in an increased precision.
- 18.
Even if the identity of workers is not directly revealed, it is possible to match downloads together to break unlinkability and deduce that these downloads come from the same individual, for example by using the time of downloads, cookies or other identification techniques.
- 19.
In general, more files can be downloaded at each worker session, but this does not impact significantly the overall amount of computation and does not impact at all the minimum download size for workers.
- 20.
StackExchange is a set of online forums where users post questions and answers, and vote for good answers https://archive.org/download/stackexchange.
- 21.
The scripts for generating our dataset are available online: https://gitlab.inria.fr/crowdguard-public/data/workers-stackoverflow.
- 22.
The ten common skills considered are the following: .net, html, javascript, css, php, c, c#, c++, ruby, lisp.
- 23.
- 24.
- 25.
If it cannot, accesses to the secondary storage device are necessary. This would increase the runtime accordingly. However, since the library is scanned once per query, sequentially, the cost would remain linear in the size of the library.
- 26.
In this kind of methods, a task can be either maintained into its starting period up till it’s lifespan, or one can consider keeping up a limited number of periods (e.g. all daily periods for the current month) and re-adding tasks on new periods packing each time they are deleted (e.g. for tasks that are meant to be longer than a month). More elaborate or intermediate methods are also possible, but we will not explore this compromise in this paper.
References
Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order-preserving encryption for numeric data. In: Proceedings of SIGMOD 2004, pp. 563–574 (2004)
Aguilar-Melchor, C., Barrier, J., Fousse, L., Killijian, M.O.: XPIR: private information retrieval for everyone. In: Proceedings of PET 2016, vol. 2016, no. 2, pp. 155–174 (2016)
Allahbakhsh, M., Benatallah, B., Ignjatovic, A., Motahari-Nezhad, H.R., Bertino, E., Dustdar, S.: Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput. 17(2), 76–81 (2013)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Béziaud, L., Allard, T., Gross-Amblard, D.: Lightweight privacy-preserving task assignment in skill-aware crowdsourcing. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 18–26. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_2
Boldyreva, A., Chenette, N., O’Neill, A.: Order-preserving encryption revisited: improved security analysis and alternative solutions. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 578–595. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22792-9_33
Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. In: Proceedings of FOCS 1995, pp. 41–50 (1995)
Cohen, A., Nissim, K.: Linear program reconstruction in practice. CoRR (2018)
Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: Proceedings of ICDE 2012, pp. 20–31 (2012)
Damgård, I., Jurik, M.: A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Kim, K. (ed.) PKC 2001. LNCS, vol. 1992, pp. 119–136. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44586-2_9
Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of SIGACT-SIGMOD-SIGART 2003, pp. 202–210 (2003)
Dwork, C.: Differential privacy. In: Proceedings of ICALP 2006, pp. 1–12 (2006)
Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)
Finnerty, A., Kucherbaev, P., Tranquillini, S., Convertino, G.: Keep it simple: reward and task design in crowdsourcing. In: Proceedings of SIGCHI 2013, pp. 14:1–14:4 (2013)
Future of work participants: imagine all the people and AI in the future of work. ACM SIGMOD Blog Post (2019)
Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)
Goldreich, O.: Foundations of cryptography-a primer. Found. Trends® Theor. Comput. Sci. 1(1), 1–116 (2005)
Gupta, T., Crooks, N., Mulhern, W., Setty, S.T., Alvisi, L., Walfish, M.: Scalable and private media consumption with popcorn. In: Proceedings of NSDI 2016, pp. 91–107 (2016)
Hay, M., Machanavajjhala, A., Miklau, G., Chen, Y., Zhang, D.: Principled evaluation of differentially private algorithms using DPBench. In: Proceedings of SIGMOD 2016, pp. 139–154. ACM (2016)
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)
Kajino, H.: Privacy-preserving crowdsourcing. Ph.D. thesis, University of Tokyo (2015)
Karmarkar, N., Karp, R.M.: The differencing method of set partitioning. Computer Science Division (EECS), University of California Berkeley (1982)
Kellaris, G., Papadopoulos, S., Papadias, D.: Engineering methods for differentially private histograms: efficiency beyond utility. IEEE TKDE 31(2), 315–328 (2018)
Kucherbaev, P., Daniel, F., Tranquillini, S., Marchese, M.: Crowdsourcing processes: a survey of approaches and opportunities. IEEE Internet Comput. 20(2), 50–56 (2015)
Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of CSCW 2012, pp. 1003–1012 (2012)
Kulkarni, A.P., Can, M., Hartmann, B.: Turkomatic: automatic, recursive task and workflow design for mechanical turk. In: Proceedings of HCOMP 2011 (2011)
Lease, M., et al.: Mechanical turk is not anonymous. SSRN Electron. J. (2013). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2228728
Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. IEEE TKDE 28(9), 2296–2319 (2016)
Lu, Y., Tang, Q., Wang, G.: Zebralancer: Private and anonymous crowdsourcing system atop open blockchain. In: Proceedings of ICDCS 2018, pp. 853–865. IEEE (2018)
Mavridis, P., Gross-Amblard, D., Miklós, Z.: Using hierarchical skills for optimized task assignment in knowledge-intensive crowdsourcing. In: Proceedings of WWW 2016, pp. 843–853 (2016)
Mironov, I., Pandey, O., Reingold, O., Vadhan, S.: Computational differential privacy. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 126–142. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_8
Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_16
Qardaji, W., Yang, W., Li, N.: Differentially private grids for geospatial data. In: Proceedings of ICDE 2013, pp. 757–768 (2013)
Qardaji, W., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. Proc. VLDB Endow. 6(14), 1954–1965 (2013)
Srba, I., Bielikova, M.: A comprehensive survey and classification of approaches for community question answering. ACM TWEB 10(3), 1–63 (2016). Article no. 18. https://dl.acm.org/toc/tweb/2016/10/3
Steutel, F.W., Kent, J.T., Bondesson, L., Barndorff-Nielsen, O.: Infinite divisibility in theory and practice [with discussion and reply]. Scand. J. Stat. 6(2), 57–64 (1979)
Steutel, F.W., Van Harn, K.: Infinite divisibility of probability distributions on the real line (2003)
To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. Proc. VLDB Endow. 7(10), 919–930 (2014)
To, H., Shahabi, C., Xiong, L.: Privacy-preserving online task assignment in spatial crowdsourcing with untrusted server. In: Proceedings of ICDE 2018, pp. 833–844 (2018)
Xia, H., Wang, Y., Huang, Y., Shah, A.: Our privacy needs to be protected at all costs: crowd workers’ privacy experiences on amazon mechanical turk. In: Proceedings of HCI 2017, vol. 1 (2017). Article no. 113
Zhai, D., et al.: Towards secure and truthful task assignment in spatial crowdsourcing. World Wide Web 22, 2017–2040 (2019). https://doi.org/10.1007/s11280-018-0638-2
Zhang, J., Xiao, X., Xie, X.: PrivTree: a differentially private algorithm for hierarchical decompositions. In: Proceedings of SIGMOD 2016, pp. 155–170 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer-Verlag GmbH Germany, part of Springer Nature
About this chapter
Cite this chapter
Duguépéroux, J., Allard, T. (2020). From Task Tuning to Task Assignment in Privacy-Preserving Crowdsourcing Platforms. In: Hameurlain, A., Tjoa, A.M., Lamarre, P., Zeitouni, K. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIV. Lecture Notes in Computer Science(), vol 12380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62271-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-62271-1_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-62270-4
Online ISBN: 978-3-662-62271-1
eBook Packages: Computer ScienceComputer Science (R0)