A Peer-to-Peer Crowdsourcing Platform for the Labeled Datasets Forming

Melnik, E. V.; Klimenko, A. B.

doi:10.1007/978-3-030-51974-2_9

E. V. Melnik¹⁵ &
A. B. Klimenko¹⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1226))

Included in the following conference series:

Computer Science On-line Conference

873 Accesses

Abstract

Training dataset forming is quite labor intensive and frequently is of high costs. Also the cost overheads depend on the policies of the service, which implements the crowdsourcing approach to data labeling. In this paper a new peer-to-peer data labeling platform concept is presented, as well as the framework of the decentralized labeling approach is described briefly. The architecture proposed allows to avoid the intermediary labeling service and to perform the crowdsourcing-based data labeling by the computational facilities of users involved. Besides, the additional consensus procedure improves the quality of the labeled data by means of the voting procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Label similarity-based weighted soft majority voting and pairing for crowdsourcing

Article 14 May 2020

Quality Control for Crowdsourced Multi-label Classification Using RAkEL

An Evidential Semi-supervised Label Aggregation Approach

References

ImageNet. http://www.image-net.org. Accessed 12 Dec 2019
Common objects in context. http://cocodataset.org/#home. Accessed 12 Dec 2019
The CIFAR-10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html. Accessed 12 Dec 2019
How to organize data labeling for machine learning: approaches and tools. https://www.kdnuggets.com/2018/05/data-labeling-machine-learning.html. Accessed 12 Dec 2019
Clickworker: Data labeling service. https://www.clickworker.com/crowdsourcing-glossary/data-labeling/. Accessed 12 Dec 2019
Amazon Mechanical Turk. https://www.mturk.com/. Accessed 12 Dec 2019
Natarajan, H., Krause, S., Gradstein, H.: Distributed Ledger Technology (DLT) and Blockchain. FinTech note, no. 1. Washington, D.C., World Bank Group (2017)
Google Scholar
Walport, M.: Distributed ledger technology: beyond blockchain. UK Government Office for Science (2016)
Google Scholar
Crosby, M., Nachiappan, P.P., Verma, S., Kalyanaraman, V.: Blockchain technology explained. Sutardja Center for Entrepreneurship & Technology Technical Report (2015)
Google Scholar
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Article Google Scholar
Hickey, R.J.: Noise modeling and evaluating learning from examples. Artif. Intell. 82(1–2), 157–179 (1996)
Article Google Scholar
Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)
Article Google Scholar
McDonald, A., Hand, D.J., Eckley, I.A.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Proceeding of 4th International Workshop Multiple Classifier Systems, Guilford, UK, pp. 35–44, June 2003
Google Scholar
Abellán, J., Masegosa, A.R.: Bagging decision trees on datasets with classification noise. In: Link, S., Prade, H. (eds.) Foundations of Information and Knowledge Systems. FoIKS 2010. Lecture Notes in Computer Science, vol. 5956, pp. 248–265. Springer, Heidelberg (2010)
Google Scholar
Joseph, L., Gyorkos, T.W., Coupal, L.: Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am. J. Epidemiol. 141(3), 263–272 (1995)
Article Google Scholar
Perez, C.J., Giron, F.J., Martin, J., Ruiz, M., Rojano, C.: Misclassified multinomial data: a Bayesian approach. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A, Matemáticas 101(1), 71–80 (2007)
MathSciNet MATH Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
Article Google Scholar
Gamberger, D., Boskovic, R., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proceeding of 16th International Conference on Machine Learning, Bled, Slovenia, pp. 143–151. Springer, San Francisco, June 1999
Google Scholar
Krauth, W., Mezard, M.: Learning algorithms with optimal stability in neural networks. J. Phys. A: Gen. Phys. 20(11), L745 (1987)
Article MathSciNet Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
Google Scholar
Cantador, I., Dorronsoro, J.R.: Boosting parallel perceptrons for label noise reduction in classification problems. In: Proceedings of First International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2005. Lecture Notes in Computer Science, vol. 3562, pp. 586–593. Springer, Berlin (2005)
Google Scholar
Liskov, B.: From viewstamped replication to Byzantine fault tolerance. In: Charron-Bost, B., Pedone, F., Schiper, A. (eds.) Replication. Lecture Notes in Computer Science, vol. 5959. Springer, Heidelberg (2010)
Google Scholar

Download references

Acknowledgements

The paper has been prepared within the RFBR projects 18-29-22086 and 18-29-03229.

Author information

Authors and Affiliations

Federal Research Centre, The Southern Scientific Centre of the Russian Academy of Sciences, 41, Chekhov Street, 344006, Rostov-on-Don, Russian Federation
E. V. Melnik
Scientific Research Institute of Multiprocessor Computer Systems of Southern Federal University, 2, Chekhov Street, 347928, Taganrog, Russian Federation
A. B. Klimenko

Authors

E. V. Melnik
View author publications
You can also search for this author in PubMed Google Scholar
A. B. Klimenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. B. Klimenko .

Editor information

Editors and Affiliations

Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Radek Silhavy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Melnik, E.V., Klimenko, A.B. (2020). A Peer-to-Peer Crowdsourcing Platform for the Labeled Datasets Forming. In: Silhavy, R. (eds) Applied Informatics and Cybernetics in Intelligent Systems. CSOC 2020. Advances in Intelligent Systems and Computing, vol 1226. Springer, Cham. https://doi.org/10.1007/978-3-030-51974-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-51974-2_9
Published: 08 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51973-5
Online ISBN: 978-3-030-51974-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics