ABSTRACT
As a fundamental task in compute vision, object detection has been developed rapidly driven by the deep learning. The lack of a large number of images with ground truth annotations has become a chief obstacle to object detection applications in many fields. Eliciting labels from crowds is a potential way to obtain large labeled data. Nonetheless, existing crowdsourced techniques, e.g., Amazon Mechanical Turk (MTurk), often fail to guarantee the quality of the annotations, which have a bad influence on the accuracy of the deep detector. A variety of methods have been developed for ground truth inference and learning from crowds. In this paper, we study strategies to crowd-source repeated labels in support for these methods. The core challenge of building such a system is to reduce the difficulty to annotate multiple objects of interest and improve the data quality as much as possible. We present a system that adopts the turn-based annotation mechanism and consists of three simple sub-tasks: a single object annotation, a quality verification task and a coverage verification task. Experimental results demonstrate that our system is scalable, accurate and can assist the detector of obtaining higher accuracy.
- Albarqouni, S., Baur, C., Achilles, F., Belagiannis, V., Demirci, S., & Navab, N. (2016). Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE transactions on medical imaging, 35(5), 1313--1321.Google Scholar
- Aroyo, L., & Welty, C. (2015). Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine, 36(1), 15--24.Google ScholarDigital Library
- Basson, S., & Kanevsky, D. (2018). Crowdsourcing Training Data For Real-Time Transcription Models.Google Scholar
- Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2013). Opensurfaces: A richly annotated catalog of surface appearance. ACM Transactions on graphics (TOG), 32(4), 111.Google Scholar
- Biewald, L., & Van Pelt, C. (2013). Crowdflower.Google Scholar
- Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 20--28.Google ScholarCross Ref
- Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) results.Google Scholar
- Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440--1448).Google ScholarDigital Library
- Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580--587).Google ScholarDigital Library
- Gurari, D., Theriault, D., Sameki, M., Isenberg, B., Pham, T. A., Purwada, A., ... & Betke, M. (2015, January). How to collect segmentations for biomedical images? A benchmark evaluating the performance of experts, crowdsourced non-experts, and algorithms. In 2015 IEEE winter conference on applications of computer vision (pp. 1169--1176). IEEE.Google Scholar
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770--778).Google ScholarCross Ref
- Howe, J. (2006). The rise of crowdsourcing. Wired magazine, 14(6), 1--4.Google Scholar
- Hu, Y., & Song M. (2019). Crowd R-CNN: An Object Detection Model Utilizing Crowdsourced Labels. 2019 International Conference on Vision, Image and Signal Processing (ICVISP). In Press.Google Scholar
- Inel, O., Khamkham, K., Cristea, T., Dumitrache, A., Rutjes, A., van der Ploeg, J., ... & Sips, R. J. (2014, October). Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data. In International Semantic Web Conference (pp. 486--504). Springer, Cham.Google Scholar
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097--1105).Google ScholarDigital Library
- Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91--99).Google ScholarDigital Library
- Russell, B. C., Torralba, A., Murphy, K. P., & Freeman, W. T. (2008). LabelMe: a database and web-based tool for image annotation. International journal of computer vision, 77(1--3), 157--173.Google Scholar
- Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, X., ... & Sun, J. (2019). Objects365: A Large-Scale, High-Quality Dataset for Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 8430--8439).Google ScholarCross Ref
- Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008, August). Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 614--622). ACM.Google ScholarDigital Library
- Su, H., Deng, J., & Fei-Fei, L. (2012, July). Crowdsourcing annotations for visual object detection. In Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence.Google Scholar
- Turk, A. M. (2012). Amazon mechanical turk. Retrieved August, 17, 2012.Google Scholar
- Von Ahn, L., & Dabbish, L. (2004, April). Labeling images with a computer game. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 319--326). ACM.Google ScholarDigital Library
- Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008). recaptcha: Human-based character recognition via web security measures. Science, 321(5895), 1465--1468.Google ScholarCross Ref
- Welinder, P., & Perona, P. (2010, June). Online crowdsourcing: rating annotators and obtaining cost-effective labels. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops (pp. 25--32). IEEE.Google Scholar
- Welinder, P., Branson, S., Perona, P., & Belongie, S. J. (2010). The multidimensional wisdom of crowds. In Advances in neural information processing systems (pp. 2424--2432).Google Scholar
Index Terms
- A Crowdsourcing Repeated Annotations System for Visual Object Detection
Recommendations
Crowd R-CNN: An Object Detection Model Utilizing Crowdsourced Labels
ICVISP 2019: Proceedings of the 3rd International Conference on Vision, Image and Signal ProcessingAccuracy of object detection has increased significantly in recent years because of the rapid development of deep learning techniques. Nevertheless, its applications in many fields are still limited, mainly due to the lack of large datasets, especially ...
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation
MIR '10: Proceedings of the international conference on Multimedia information retrievalThe creation of golden standard datasets is a costly business. Optimally more than one judgment per document is obtained to ensure a high quality on annotations. In this context, we explore how much annotations from experts differ from each other, how ...
Toward crowdsourcing micro-level behavior annotations: the challenges of interface, training, and generalization
IUI '14: Proceedings of the 19th international conference on Intelligent User InterfacesResearch that involves human behavior analysis usually requires laborious and costly efforts for obtaining micro-level behavior annotations on a large video corpus. With the emerging paradigm of crowdsourcing however, these efforts can be considerably ...
Comments