Skip to main content

Human-in-the-Loop Web Resource Classification

  • Conference paper
  • First Online:
Book cover On the Move to Meaningful Internet Systems: OTM 2016 Conferences (OTM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10033))

Abstract

Engaging humans in the resolution of classification tasks has been shown to be effective especially when digital resources are considered, with complex features to be abstracted for an automated procedure, like images or multimedia web resources. In this paper, we propose the \(\mathsf {HC^2}\) crowdclustering approach for unsupervised classification of web resources, by allowing the classification categories to dynamically emerge from the crowd. In \(\mathsf {HC^2}\), crowd workers actively participate to clustering activities (i) by resolving tasks in which they are asked to visually recognize groups of similar resources and (ii) by labeling recognized clusters with prominent keywords. To increase flexibility, \(\mathsf {HC^2}\) can be interactively configured to dynamically set the balance between human engagement and automated procedures in cluster formation, according to the kind and nature of resources to be classified. For experimentation and evaluation, the \(\mathsf {HC^2}\) approach has been deployed on the Argo platform providing crowdsourcing techniques for consensus-based task execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Depending on the domain of the considered resources, the WordNet lexical system can be replaced by other kinds of support knowledge-bases like folksonomies, shared vocabularies, and domain ontologies.

  2. 2.

    The worker trustworthiness is set on the basis of task answers provided by the worker. It is increased when the worker answer contributes to reach the consensus and it is decreased otherwise. See [4] for further details about worker trustworthiness specification.

  3. 3.

    http://island.ricerca.di.unimi.it/projects/argo/ (Italian language).

  4. 4.

    Other merging strategies are possible in hierarchical clustering (e.g., single-link strategy, average-link strategy). In this experimentation, the complete-link strategy has been selected since it was the merging strategy that provided the best clustering results.

References

  1. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)

    Article  Google Scholar 

  2. André, P., Kittur, A., Dow, S.P.: Crowd synthesis: extracting categories and clusters from complex data. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work, Baltimore, MD, USA (2014)

    Google Scholar 

  3. Barowy, D.W., Curtsinger, C., Berger, E.D., McGregor, A.: AutoMan: a platform for integrating human-based and digital computation. In: Proceedings of the 27th Annual ACM SIGPLAN OOPSLA Conference, Tucson, AZ, USA (2012)

    Google Scholar 

  4. Castano, S., Ferrara, A., Genta, L., Montanelli, S.: Combining crowd consensus and user trustworthiness for managing collective tasks. Future Gener. Comput. Syst. 54, 378–388 (2016)

    Article  Google Scholar 

  5. Chae, G., Park, J., Park, J., Yeo, W.S., Shi, C.: Linking and clustering artworks using social tags: revitalizing crowd-sourced information on cultural collections. J. Assoc. Inf. Sci. Technol. 67(4), 885–899 (2015)

    Google Scholar 

  6. Chen, Q., Wang, G., Tan, C.L.: Web image organization and object discovery by actively creating visual clusters through crowdsourcing. In: Proceedings of the 24th International Conference on Tools with Artificial Intelligence, Toronto, Canada (2012)

    Google Scholar 

  7. Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of Computer Vision and Pattern Recognition, San Diego, CA, USA (2005)

    Google Scholar 

  8. Ferrara, A., Genta, L., Montanelli, S., Castano, S.: Dimensional clustering of linked data: techniques and applications. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., Antonellis, V., Virgilio, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. LNCS, vol. 8990, pp. 55–86. Springer, Heidelberg (2015)

    Google Scholar 

  9. Gomes, R.G., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS), Granada, Spain (2011)

    Google Scholar 

  10. Lee, J., Cho, H., Park, J.W., Cha, Y.R., Hwang, S.W., Nie, Z., Wen, J.R.: Hybrid entity clustering using crowds and data. VLDB J. 22(5), 711–726 (2013)

    Article  Google Scholar 

  11. Machedon, R., Rand, W., Joshi, Y.: Automatic crowdsourcing-based classification of marketing messaging on twitter. In: Proceedings of the International Conference on Social Computing, Washington, DC, USA (2013)

    Google Scholar 

  12. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  13. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada (2001)

    Google Scholar 

  14. Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endow. 7(13), 1529–1540 (2014)

    Article  Google Scholar 

  15. Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)

    Article  Google Scholar 

  16. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico (1994)

    Google Scholar 

  17. Yi, J., Jin, R., Jain, S., Yang, T., Jain, A.K.: Semi-crowdsourced clustering: generalizing crowd labeling by robust distance metric learning. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA (2011)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank M.Sc. Riccardo Corbella for the fruitful contribution to the \(\mathsf {HC^2}\) specification and experimentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Montanelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Castano, S., Ferrara, A., Montanelli, S. (2016). Human-in-the-Loop Web Resource Classification. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48472-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48471-6

  • Online ISBN: 978-3-319-48472-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics