Skip to main content

A Peer-to-Peer Crowdsourcing Platform for the Labeled Datasets Forming

  • Conference paper
  • First Online:
Applied Informatics and Cybernetics in Intelligent Systems (CSOC 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1226))

Included in the following conference series:

  • 873 Accesses

Abstract

Training dataset forming is quite labor intensive and frequently is of high costs. Also the cost overheads depend on the policies of the service, which implements the crowdsourcing approach to data labeling. In this paper a new peer-to-peer data labeling platform concept is presented, as well as the framework of the decentralized labeling approach is described briefly. The architecture proposed allows to avoid the intermediary labeling service and to perform the crowdsourcing-based data labeling by the computational facilities of users involved. Besides, the additional consensus procedure improves the quality of the labeled data by means of the voting procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. ImageNet. http://www.image-net.org. Accessed 12 Dec 2019

  2. Common objects in context. http://cocodataset.org/#home. Accessed 12 Dec 2019

  3. The CIFAR-10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html. Accessed 12 Dec 2019

  4. How to organize data labeling for machine learning: approaches and tools. https://www.kdnuggets.com/2018/05/data-labeling-machine-learning.html. Accessed 12 Dec 2019

  5. Clickworker: Data labeling service. https://www.clickworker.com/crowdsourcing-glossary/data-labeling/. Accessed 12 Dec 2019

  6. Amazon Mechanical Turk. https://www.mturk.com/. Accessed 12 Dec 2019

  7. Natarajan, H., Krause, S., Gradstein, H.: Distributed Ledger Technology (DLT) and Blockchain. FinTech note, no. 1. Washington, D.C., World Bank Group (2017)

    Google Scholar 

  8. Walport, M.: Distributed ledger technology: beyond blockchain. UK Government Office for Science (2016)

    Google Scholar 

  9. Crosby, M., Nachiappan, P.P., Verma, S., Kalyanaraman, V.: Blockchain technology explained. Sutardja Center for Entrepreneurship & Technology Technical Report (2015)

    Google Scholar 

  10. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)

    Article  Google Scholar 

  11. Hickey, R.J.: Noise modeling and evaluating learning from examples. Artif. Intell. 82(1–2), 157–179 (1996)

    Article  Google Scholar 

  12. Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)

    Article  Google Scholar 

  13. McDonald, A., Hand, D.J., Eckley, I.A.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Proceeding of 4th International Workshop Multiple Classifier Systems, Guilford, UK, pp. 35–44, June 2003

    Google Scholar 

  14. Abellán, J., Masegosa, A.R.: Bagging decision trees on datasets with classification noise. In: Link, S., Prade, H. (eds.) Foundations of Information and Knowledge Systems. FoIKS 2010. Lecture Notes in Computer Science, vol. 5956, pp. 248–265. Springer, Heidelberg (2010)

    Google Scholar 

  15. Joseph, L., Gyorkos, T.W., Coupal, L.: Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am. J. Epidemiol. 141(3), 263–272 (1995)

    Article  Google Scholar 

  16. Perez, C.J., Giron, F.J., Martin, J., Ruiz, M., Rojano, C.: Misclassified multinomial data: a Bayesian approach. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A, Matemáticas 101(1), 71–80 (2007)

    MathSciNet  MATH  Google Scholar 

  17. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)

    Article  Google Scholar 

  18. Gamberger, D., Boskovic, R., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proceeding of 16th International Conference on Machine Learning, Bled, Slovenia, pp. 143–151. Springer, San Francisco, June 1999

    Google Scholar 

  19. Krauth, W., Mezard, M.: Learning algorithms with optimal stability in neural networks. J. Phys. A: Gen. Phys. 20(11), L745 (1987)

    Article  MathSciNet  Google Scholar 

  20. Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)

    Google Scholar 

  21. Cantador, I., Dorronsoro, J.R.: Boosting parallel perceptrons for label noise reduction in classification problems. In: Proceedings of First International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2005. Lecture Notes in Computer Science, vol. 3562, pp. 586–593. Springer, Berlin (2005)

    Google Scholar 

  22. Liskov, B.: From viewstamped replication to Byzantine fault tolerance. In: Charron-Bost, B., Pedone, F., Schiper, A. (eds.) Replication. Lecture Notes in Computer Science, vol. 5959. Springer, Heidelberg (2010)

    Google Scholar 

Download references

Acknowledgements

The paper has been prepared within the RFBR projects 18-29-22086 and 18-29-03229.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. B. Klimenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Melnik, E.V., Klimenko, A.B. (2020). A Peer-to-Peer Crowdsourcing Platform for the Labeled Datasets Forming. In: Silhavy, R. (eds) Applied Informatics and Cybernetics in Intelligent Systems. CSOC 2020. Advances in Intelligent Systems and Computing, vol 1226. Springer, Cham. https://doi.org/10.1007/978-3-030-51974-2_9

Download citation

Publish with us

Policies and ethics