Skip to main content

From Task Tuning to Task Assignment in Privacy-Preserving Crowdsourcing Platforms

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIV

Abstract

Specialized worker profiles of crowdsourcing platforms may contain a large amount of identifying and possibly sensitive personal information (e.g., personal preferences, skills, available slots, available devices) raising strong privacy concerns. This led to the design of privacy-preserving crowdsourcing platforms, that aim at enabling efficient crowdsourcing processes while providing strong privacy guarantees even when the platform is not fully trusted. In this paper, we propose two contributions. First, we propose the PKD algorithm with the goal of supporting a large variety of aggregate usages of worker profiles within a privacy-preserving crowdsourcing platform. The PKD algorithm combines together homomorphic encryption and differential privacy for computing (perturbed) partitions of the multi-dimensional space of skills of the actual population of workers and a (perturbed) COUNT of workers per partition. Second, we propose to benefit from recent progresses in Private Information Retrieval techniques in order to design a solution to task assignment that is both private and affordable. We perform an in-depth study of the problem of using PIR techniques for proposing tasks to workers, show that it is NP-Hard, and come up with the PKD PIR Packing heuristic that groups tasks together according to the partitioning output by the PKD algorithm. In a nutshell, we design the PKD algorithm and the PKD PIR Packing heuristic, we prove formally their security against honest-but-curious workers and/or platform, we analyze their complexities, and we demonstrate their quality and affordability in real-life scenarios through an extensive experimental evaluation performed over both synthetic and realistic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.mturk.com/.

  2. 2.

    https://www.handy.com/.

  3. 3.

    https://www.kicklox.com/.

  4. 4.

    https://tara.ai/.

  5. 5.

    We adopt in this paper a broad definition of crowdsourcing, including in particular freelancing platforms (similarly to [29]).

  6. 6.

    https://requester.mturk.com/pricing.

  7. 7.

    See for example the Kicklox search form (https://www.kicklox.com/en/) that inputs a list of keywords (typically skills) and displays the corresponding number of workers available.

  8. 8.

    See for example, Kicklox (https://www.kicklox.com/en/) or Tara (https://tara.ai/). The secondary usage consisting in promoting the platform is sometimes performed through a public access to detailed parts of worker profiles (e.g., Malt (https://www.malt.com/), 404works (https://www.404works.com/en/freelancers)).

  9. 9.

    See, e.g.,http://applymagicsauce.com/about-us.

  10. 10.

    For example, internal emails that were leaked from Deliveroo indicate that the geolocation system of Deliveroo was used internally for identifying the riders that participated to strikes against the platform. https://www.lemonde.fr/culture/article/2019/09/24/television-cash-investigation-a-la-rencontre-des-nouveaux-proletaires-du-web_6012758_3246.html.

  11. 11.

    In another example, an Uber executive claimed having tracked a journalist using the company geolocation system. https://tinyurl.com/y4cdvw45.

  12. 12.

    https://eur-lex.europa.eu/eli/reg/2016/679/oj.

  13. 13.

    https://www.caprivacy.org/.

  14. 14.

    Note that limiting the information disclosed to the platform (e.g., perturbed information about worker profiles) relieves platforms from the costly task of handling personal data. The European GDPR indeed explicitely excludes anonymized data from its scope (see Article 4, Recital 26 https://gdpr-info.eu/recitals/no-26/).

  15. 15.

    We require that the sum of \(|\mathcal {P}|-\tau \) noise-shares be enough to satisfy differential privacy but we effectively sum \(|\mathcal {P}|\) noise-shares. Note that summing more noise-shares than necessary does not jeopardize privacy guarantees.

  16. 16.

    Note that in the specific case where the median falls within a bin equal to 0 (i.e., \(\widetilde{b_{*,k}}=0\)), then any value within \(\phi _k\) is equivalent.

  17. 17.

    Note that the perturbed histograms could have been used for computing these counts but using a dedicated count has been shown to result in an increased precision.

  18. 18.

    Even if the identity of workers is not directly revealed, it is possible to match downloads together to break unlinkability and deduce that these downloads come from the same individual, for example by using the time of downloads, cookies or other identification techniques.

  19. 19.

    In general, more files can be downloaded at each worker session, but this does not impact significantly the overall amount of computation and does not impact at all the minimum download size for workers.

  20. 20.

    StackExchange is a set of online forums where users post questions and answers, and vote for good answers https://archive.org/download/stackexchange.

  21. 21.

    The scripts for generating our dataset are available online: https://gitlab.inria.fr/crowdguard-public/data/workers-stackoverflow.

  22. 22.

    The ten common skills considered are the following: .net, html, javascript, css, php, c, c#, c++, ruby, lisp.

  23. 23.

    http://cs.utdallas.edu/dspl/cgi-bin/pailliertoolbox/index.php?go=download.

  24. 24.

    https://github.com/XPIR-team/XPIR.

  25. 25.

    If it cannot, accesses to the secondary storage device are necessary. This would increase the runtime accordingly. However, since the library is scanned once per query, sequentially, the cost would remain linear in the size of the library.

  26. 26.

    In this kind of methods, a task can be either maintained into its starting period up till it’s lifespan, or one can consider keeping up a limited number of periods (e.g. all daily periods for the current month) and re-adding tasks on new periods packing each time they are deleted (e.g. for tasks that are meant to be longer than a month). More elaborate or intermediate methods are also possible, but we will not explore this compromise in this paper.

References

  1. Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order-preserving encryption for numeric data. In: Proceedings of SIGMOD 2004, pp. 563–574 (2004)

    Google Scholar 

  2. Aguilar-Melchor, C., Barrier, J., Fousse, L., Killijian, M.O.: XPIR: private information retrieval for everyone. In: Proceedings of PET 2016, vol. 2016, no. 2, pp. 155–174 (2016)

    Google Scholar 

  3. Allahbakhsh, M., Benatallah, B., Ignjatovic, A., Motahari-Nezhad, H.R., Bertino, E., Dustdar, S.: Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput. 17(2), 76–81 (2013)

    Article  Google Scholar 

  4. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  Google Scholar 

  5. Béziaud, L., Allard, T., Gross-Amblard, D.: Lightweight privacy-preserving task assignment in skill-aware crowdsourcing. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 18–26. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_2

    Chapter  Google Scholar 

  6. Boldyreva, A., Chenette, N., O’Neill, A.: Order-preserving encryption revisited: improved security analysis and alternative solutions. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 578–595. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22792-9_33

    Chapter  Google Scholar 

  7. Chor, B., Goldreich, O., Kushilevitz, E., Sudan, M.: Private information retrieval. In: Proceedings of FOCS 1995, pp. 41–50 (1995)

    Google Scholar 

  8. Cohen, A., Nissim, K.: Linear program reconstruction in practice. CoRR (2018)

    Google Scholar 

  9. Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., Yu, T.: Differentially private spatial decompositions. In: Proceedings of ICDE 2012, pp. 20–31 (2012)

    Google Scholar 

  10. Damgård, I., Jurik, M.: A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Kim, K. (ed.) PKC 2001. LNCS, vol. 1992, pp. 119–136. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44586-2_9

    Chapter  MATH  Google Scholar 

  11. Dinur, I., Nissim, K.: Revealing information while preserving privacy. In: Proceedings of SIGACT-SIGMOD-SIGART 2003, pp. 202–210 (2003)

    Google Scholar 

  12. Dwork, C.: Differential privacy. In: Proceedings of ICALP 2006, pp. 1–12 (2006)

    Google Scholar 

  13. Dwork, C., Roth, A.: The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9(3–4), 211–407 (2014)

    MathSciNet  MATH  Google Scholar 

  14. Finnerty, A., Kucherbaev, P., Tranquillini, S., Convertino, G.: Keep it simple: reward and task design in crowdsourcing. In: Proceedings of SIGCHI 2013, pp. 14:1–14:4 (2013)

    Google Scholar 

  15. Future of work participants: imagine all the people and AI in the future of work. ACM SIGMOD Blog Post (2019)

    Google Scholar 

  16. Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)

    Article  MathSciNet  Google Scholar 

  17. Goldreich, O.: Foundations of cryptography-a primer. Found. Trends® Theor. Comput. Sci. 1(1), 1–116 (2005)

    Article  MathSciNet  Google Scholar 

  18. Gupta, T., Crooks, N., Mulhern, W., Setty, S.T., Alvisi, L., Walfish, M.: Scalable and private media consumption with popcorn. In: Proceedings of NSDI 2016, pp. 91–107 (2016)

    Google Scholar 

  19. Hay, M., Machanavajjhala, A., Miklau, G., Chen, Y., Zhang, D.: Principled evaluation of differentially private algorithms using DPBench. In: Proceedings of SIGMOD 2016, pp. 139–154. ACM (2016)

    Google Scholar 

  20. Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)

    Article  Google Scholar 

  21. Kajino, H.: Privacy-preserving crowdsourcing. Ph.D. thesis, University of Tokyo (2015)

    Google Scholar 

  22. Karmarkar, N., Karp, R.M.: The differencing method of set partitioning. Computer Science Division (EECS), University of California Berkeley (1982)

    Google Scholar 

  23. Kellaris, G., Papadopoulos, S., Papadias, D.: Engineering methods for differentially private histograms: efficiency beyond utility. IEEE TKDE 31(2), 315–328 (2018)

    Google Scholar 

  24. Kucherbaev, P., Daniel, F., Tranquillini, S., Marchese, M.: Crowdsourcing processes: a survey of approaches and opportunities. IEEE Internet Comput. 20(2), 50–56 (2015)

    Article  Google Scholar 

  25. Kulkarni, A., Can, M., Hartmann, B.: Collaboratively crowdsourcing workflows with turkomatic. In: Proceedings of CSCW 2012, pp. 1003–1012 (2012)

    Google Scholar 

  26. Kulkarni, A.P., Can, M., Hartmann, B.: Turkomatic: automatic, recursive task and workflow design for mechanical turk. In: Proceedings of HCOMP 2011 (2011)

    Google Scholar 

  27. Lease, M., et al.: Mechanical turk is not anonymous. SSRN Electron. J. (2013). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2228728

  28. Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. IEEE TKDE 28(9), 2296–2319 (2016)

    Google Scholar 

  29. Lu, Y., Tang, Q., Wang, G.: Zebralancer: Private and anonymous crowdsourcing system atop open blockchain. In: Proceedings of ICDCS 2018, pp. 853–865. IEEE (2018)

    Google Scholar 

  30. Mavridis, P., Gross-Amblard, D., Miklós, Z.: Using hierarchical skills for optimized task assignment in knowledge-intensive crowdsourcing. In: Proceedings of WWW 2016, pp. 843–853 (2016)

    Google Scholar 

  31. Mironov, I., Pandey, O., Reingold, O., Vadhan, S.: Computational differential privacy. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 126–142. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_8

    Chapter  Google Scholar 

  32. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48910-X_16

    Chapter  Google Scholar 

  33. Qardaji, W., Yang, W., Li, N.: Differentially private grids for geospatial data. In: Proceedings of ICDE 2013, pp. 757–768 (2013)

    Google Scholar 

  34. Qardaji, W., Yang, W., Li, N.: Understanding hierarchical methods for differentially private histograms. Proc. VLDB Endow. 6(14), 1954–1965 (2013)

    Article  Google Scholar 

  35. Srba, I., Bielikova, M.: A comprehensive survey and classification of approaches for community question answering. ACM TWEB 10(3), 1–63 (2016). Article no. 18. https://dl.acm.org/toc/tweb/2016/10/3

  36. Steutel, F.W., Kent, J.T., Bondesson, L., Barndorff-Nielsen, O.: Infinite divisibility in theory and practice [with discussion and reply]. Scand. J. Stat. 6(2), 57–64 (1979)

    Google Scholar 

  37. Steutel, F.W., Van Harn, K.: Infinite divisibility of probability distributions on the real line (2003)

    Google Scholar 

  38. To, H., Ghinita, G., Shahabi, C.: A framework for protecting worker location privacy in spatial crowdsourcing. Proc. VLDB Endow. 7(10), 919–930 (2014)

    Article  Google Scholar 

  39. To, H., Shahabi, C., Xiong, L.: Privacy-preserving online task assignment in spatial crowdsourcing with untrusted server. In: Proceedings of ICDE 2018, pp. 833–844 (2018)

    Google Scholar 

  40. Xia, H., Wang, Y., Huang, Y., Shah, A.: Our privacy needs to be protected at all costs: crowd workers’ privacy experiences on amazon mechanical turk. In: Proceedings of HCI 2017, vol. 1 (2017). Article no. 113

    Google Scholar 

  41. Zhai, D., et al.: Towards secure and truthful task assignment in spatial crowdsourcing. World Wide Web 22, 2017–2040 (2019). https://doi.org/10.1007/s11280-018-0638-2

    Article  Google Scholar 

  42. Zhang, J., Xiao, X., Xie, X.: PrivTree: a differentially private algorithm for hierarchical decompositions. In: Proceedings of SIGMOD 2016, pp. 155–170 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joris Duguépéroux .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Duguépéroux, J., Allard, T. (2020). From Task Tuning to Task Assignment in Privacy-Preserving Crowdsourcing Platforms. In: Hameurlain, A., Tjoa, A.M., Lamarre, P., Zeitouni, K. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIV. Lecture Notes in Computer Science(), vol 12380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62271-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-62271-1_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-62270-4

  • Online ISBN: 978-3-662-62271-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics