Abstract
Engaging humans in the resolution of classification tasks has been shown to be effective especially when digital resources are considered, with complex features to be abstracted for an automated procedure, like images or multimedia web resources. In this paper, we propose the \(\mathsf {HC^2}\) crowdclustering approach for unsupervised classification of web resources, by allowing the classification categories to dynamically emerge from the crowd. In \(\mathsf {HC^2}\), crowd workers actively participate to clustering activities (i) by resolving tasks in which they are asked to visually recognize groups of similar resources and (ii) by labeling recognized clusters with prominent keywords. To increase flexibility, \(\mathsf {HC^2}\) can be interactively configured to dynamically set the balance between human engagement and automated procedures in cluster formation, according to the kind and nature of resources to be classified. For experimentation and evaluation, the \(\mathsf {HC^2}\) approach has been deployed on the Argo platform providing crowdsourcing techniques for consensus-based task execution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Depending on the domain of the considered resources, the WordNet lexical system can be replaced by other kinds of support knowledge-bases like folksonomies, shared vocabularies, and domain ontologies.
- 2.
The worker trustworthiness is set on the basis of task answers provided by the worker. It is increased when the worker answer contributes to reach the consensus and it is decreased otherwise. See [4] for further details about worker trustworthiness specification.
- 3.
http://island.ricerca.di.unimi.it/projects/argo/ (Italian language).
- 4.
Other merging strategies are possible in hierarchical clustering (e.g., single-link strategy, average-link strategy). In this experimentation, the complete-link strategy has been selected since it was the merging strategy that provided the best clustering results.
References
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)
André, P., Kittur, A., Dow, S.P.: Crowd synthesis: extracting categories and clusters from complex data. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work, Baltimore, MD, USA (2014)
Barowy, D.W., Curtsinger, C., Berger, E.D., McGregor, A.: AutoMan: a platform for integrating human-based and digital computation. In: Proceedings of the 27th Annual ACM SIGPLAN OOPSLA Conference, Tucson, AZ, USA (2012)
Castano, S., Ferrara, A., Genta, L., Montanelli, S.: Combining crowd consensus and user trustworthiness for managing collective tasks. Future Gener. Comput. Syst. 54, 378–388 (2016)
Chae, G., Park, J., Park, J., Yeo, W.S., Shi, C.: Linking and clustering artworks using social tags: revitalizing crowd-sourced information on cultural collections. J. Assoc. Inf. Sci. Technol. 67(4), 885–899 (2015)
Chen, Q., Wang, G., Tan, C.L.: Web image organization and object discovery by actively creating visual clusters through crowdsourcing. In: Proceedings of the 24th International Conference on Tools with Artificial Intelligence, Toronto, Canada (2012)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of Computer Vision and Pattern Recognition, San Diego, CA, USA (2005)
Ferrara, A., Genta, L., Montanelli, S., Castano, S.: Dimensional clustering of linked data: techniques and applications. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., Antonellis, V., Virgilio, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. LNCS, vol. 8990, pp. 55–86. Springer, Heidelberg (2015)
Gomes, R.G., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Granada, Spain (2011)
Lee, J., Cho, H., Park, J.W., Cha, Y.R., Hwang, S.W., Nie, Z., Wen, J.R.: Hybrid entity clustering using crowds and data. VLDB J. 22(5), 711–726 (2013)
Machedon, R., Rand, W., Joshi, Y.: Automatic crowdsourcing-based classification of marketing messaging on twitter. In: Proceedings of the International Conference on Social Computing, Washington, DC, USA (2013)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada (2001)
Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endow. 7(13), 1529–1540 (2014)
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico (1994)
Yi, J., Jin, R., Jain, S., Yang, T., Jain, A.K.: Semi-crowdsourced clustering: generalizing crowd labeling by robust distance metric learning. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA (2011)
Acknowledgements
The authors would like to thank M.Sc. Riccardo Corbella for the fruitful contribution to the \(\mathsf {HC^2}\) specification and experimentation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Castano, S., Ferrara, A., Montanelli, S. (2016). Human-in-the-Loop Web Resource Classification. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-48472-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48471-6
Online ISBN: 978-3-319-48472-3
eBook Packages: Computer ScienceComputer Science (R0)