Abstract
Training dataset forming is quite labor intensive and frequently is of high costs. Also the cost overheads depend on the policies of the service, which implements the crowdsourcing approach to data labeling. In this paper a new peer-to-peer data labeling platform concept is presented, as well as the framework of the decentralized labeling approach is described briefly. The architecture proposed allows to avoid the intermediary labeling service and to perform the crowdsourcing-based data labeling by the computational facilities of users involved. Besides, the additional consensus procedure improves the quality of the labeled data by means of the voting procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
ImageNet. http://www.image-net.org. Accessed 12 Dec 2019
Common objects in context. http://cocodataset.org/#home. Accessed 12 Dec 2019
The CIFAR-10 dataset. https://www.cs.toronto.edu/~kriz/cifar.html. Accessed 12 Dec 2019
How to organize data labeling for machine learning: approaches and tools. https://www.kdnuggets.com/2018/05/data-labeling-machine-learning.html. Accessed 12 Dec 2019
Clickworker: Data labeling service. https://www.clickworker.com/crowdsourcing-glossary/data-labeling/. Accessed 12 Dec 2019
Amazon Mechanical Turk. https://www.mturk.com/. Accessed 12 Dec 2019
Natarajan, H., Krause, S., Gradstein, H.: Distributed Ledger Technology (DLT) and Blockchain. FinTech note, no. 1. Washington, D.C., World Bank Group (2017)
Walport, M.: Distributed ledger technology: beyond blockchain. UK Government Office for Science (2016)
Crosby, M., Nachiappan, P.P., Verma, S., Kalyanaraman, V.: Blockchain technology explained. Sutardja Center for Entrepreneurship & Technology Technical Report (2015)
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Hickey, R.J.: Noise modeling and evaluating learning from examples. Artif. Intell. 82(1–2), 157–179 (1996)
Manwani, N., Sastry, P.S.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)
McDonald, A., Hand, D.J., Eckley, I.A.: An empirical comparison of three boosting algorithms on real data sets with artificial class noise. In: Proceeding of 4th International Workshop Multiple Classifier Systems, Guilford, UK, pp. 35–44, June 2003
Abellán, J., Masegosa, A.R.: Bagging decision trees on datasets with classification noise. In: Link, S., Prade, H. (eds.) Foundations of Information and Knowledge Systems. FoIKS 2010. Lecture Notes in Computer Science, vol. 5956, pp. 248–265. Springer, Heidelberg (2010)
Joseph, L., Gyorkos, T.W., Coupal, L.: Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am. J. Epidemiol. 141(3), 263–272 (1995)
Perez, C.J., Giron, F.J., Martin, J., Ruiz, M., Rojano, C.: Misclassified multinomial data: a Bayesian approach. Revista de la Real Academia de Ciencias Exactas, FÃsicas y Naturales. Serie A, Matemáticas 101(1), 71–80 (2007)
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
Gamberger, D., Boskovic, R., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: Proceeding of 16th International Conference on Machine Learning, Bled, Slovenia, pp. 143–151. Springer, San Francisco, June 1999
Krauth, W., Mezard, M.: Learning algorithms with optimal stability in neural networks. J. Phys. A: Gen. Phys. 20(11), L745 (1987)
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
Cantador, I., Dorronsoro, J.R.: Boosting parallel perceptrons for label noise reduction in classification problems. In: Proceedings of First International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2005. Lecture Notes in Computer Science, vol. 3562, pp. 586–593. Springer, Berlin (2005)
Liskov, B.: From viewstamped replication to Byzantine fault tolerance. In: Charron-Bost, B., Pedone, F., Schiper, A. (eds.) Replication. Lecture Notes in Computer Science, vol. 5959. Springer, Heidelberg (2010)
Acknowledgements
The paper has been prepared within the RFBR projects 18-29-22086 and 18-29-03229.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Melnik, E.V., Klimenko, A.B. (2020). A Peer-to-Peer Crowdsourcing Platform for the Labeled Datasets Forming. In: Silhavy, R. (eds) Applied Informatics and Cybernetics in Intelligent Systems. CSOC 2020. Advances in Intelligent Systems and Computing, vol 1226. Springer, Cham. https://doi.org/10.1007/978-3-030-51974-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-51974-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51973-5
Online ISBN: 978-3-030-51974-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)