Human-in-the-Loop Web Resource Classification

Castano, Silvana; Ferrara, Alfio; Montanelli, Stefano

doi:10.1007/978-3-319-48472-3_13

Silvana Castano²⁰,
Alfio Ferrara²⁰ &
Stefano Montanelli²⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10033))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1447 Accesses
2 Citations

Abstract

Engaging humans in the resolution of classification tasks has been shown to be effective especially when digital resources are considered, with complex features to be abstracted for an automated procedure, like images or multimedia web resources. In this paper, we propose the \(\mathsf {HC^2}\) crowdclustering approach for unsupervised classification of web resources, by allowing the classification categories to dynamically emerge from the crowd. In \(\mathsf {HC^2}\), crowd workers actively participate to clustering activities (i) by resolving tasks in which they are asked to visually recognize groups of similar resources and (ii) by labeling recognized clusters with prominent keywords. To increase flexibility, \(\mathsf {HC^2}\) can be interactively configured to dynamically set the balance between human engagement and automated procedures in cluster formation, according to the kind and nature of resources to be classified. For experimentation and evaluation, the \(\mathsf {HC^2}\) approach has been deployed on the Argo platform providing crowdsourcing techniques for consensus-based task execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Depending on the domain of the considered resources, the WordNet lexical system can be replaced by other kinds of support knowledge-bases like folksonomies, shared vocabularies, and domain ontologies.
2.
The worker trustworthiness is set on the basis of task answers provided by the worker. It is increased when the worker answer contributes to reach the consensus and it is decreased otherwise. See [4] for further details about worker trustworthiness specification.
3.
http://island.ricerca.di.unimi.it/projects/argo/ (Italian language).
4.
Other merging strategies are possible in hierarchical clustering (e.g., single-link strategy, average-link strategy). In this experimentation, the complete-link strategy has been selected since it was the merging strategy that provided the best clustering results.

References

Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)
Article Google Scholar
André, P., Kittur, A., Dow, S.P.: Crowd synthesis: extracting categories and clusters from complex data. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work, Baltimore, MD, USA (2014)
Google Scholar
Barowy, D.W., Curtsinger, C., Berger, E.D., McGregor, A.: AutoMan: a platform for integrating human-based and digital computation. In: Proceedings of the 27th Annual ACM SIGPLAN OOPSLA Conference, Tucson, AZ, USA (2012)
Google Scholar
Castano, S., Ferrara, A., Genta, L., Montanelli, S.: Combining crowd consensus and user trustworthiness for managing collective tasks. Future Gener. Comput. Syst. 54, 378–388 (2016)
Article Google Scholar
Chae, G., Park, J., Park, J., Yeo, W.S., Shi, C.: Linking and clustering artworks using social tags: revitalizing crowd-sourced information on cultural collections. J. Assoc. Inf. Sci. Technol. 67(4), 885–899 (2015)
Google Scholar
Chen, Q., Wang, G., Tan, C.L.: Web image organization and object discovery by actively creating visual clusters through crowdsourcing. In: Proceedings of the 24th International Conference on Tools with Artificial Intelligence, Toronto, Canada (2012)
Google Scholar
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of Computer Vision and Pattern Recognition, San Diego, CA, USA (2005)
Google Scholar
Ferrara, A., Genta, L., Montanelli, S., Castano, S.: Dimensional clustering of linked data: techniques and applications. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., Antonellis, V., Virgilio, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. LNCS, vol. 8990, pp. 55–86. Springer, Heidelberg (2015)
Google Scholar
Gomes, R.G., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Granada, Spain (2011)
Google Scholar
Lee, J., Cho, H., Park, J.W., Cha, Y.R., Hwang, S.W., Nie, Z., Wen, J.R.: Hybrid entity clustering using crowds and data. VLDB J. 22(5), 711–726 (2013)
Article Google Scholar
Machedon, R., Rand, W., Joshi, Y.: Automatic crowdsourcing-based classification of marketing messaging on twitter. In: Proceedings of the International Conference on Social Computing, Washington, DC, USA (2013)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada (2001)
Google Scholar
Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endow. 7(13), 1529–1540 (2014)
Article Google Scholar
Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)
Article Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico (1994)
Google Scholar
Yi, J., Jin, R., Jain, S., Yang, T., Jain, A.K.: Semi-crowdsourced clustering: generalizing crowd labeling by robust distance metric learning. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA (2011)
Google Scholar

Download references

Acknowledgements

The authors would like to thank M.Sc. Riccardo Corbella for the fruitful contribution to the \(\mathsf {HC^2}\) specification and experimentation.

Author information

Authors and Affiliations

DI, Università degli Studi di Milano, via Comelico, 39, 20135, Milano, Italy
Silvana Castano, Alfio Ferrara & Stefano Montanelli

Authors

Silvana Castano
View author publications
You can also search for this author in PubMed Google Scholar
Alfio Ferrara
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Montanelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Montanelli .

Editor information

Editors and Affiliations

ADAPT Centre, Trinity College Dublin, Dublin 2, Ireland
Christophe Debruyne
University of Lorraine, Vandoeuvre-les-Nancy, France
Hervé Panetto
TU Graz, Graz, Austria
Robert Meersman
La Trobe University, Melbourne, Australia
Tharam Dillon
Institute of Computer Languages, TU Wien, Vienna, Austria
eva Kühn
ADAPT Centre, Trinity College Dublin, Dublin 2, Ireland
Declan O'Sullivan
Università degli Studi di Milano Crema, Crema, Italy
Claudio Agostino Ardagna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Castano, S., Ferrara, A., Montanelli, S. (2016). Human-in-the-Loop Web Resource Classification. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-48472-3_13
Published: 18 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48471-6
Online ISBN: 978-3-319-48472-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics