Skip to main content

Crowd-Powered Systems to Diminish the Effects of Semantic Drift

  • Conference paper
  • First Online:
  • 1310 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11734))

Abstract

Internet and social Web made possible the acquisition of information to feed a growing number of Machine Learning (ML) applications and, in addition, brought light to the use of crowdsourcing approaches, commonly applied to problems that are easy for humans but difficult for computers to solve, building the crowd-powered systems. In this work, we consider the issue of semantic drift in a bootstrap learning algorithm and propose the novel idea of a crowd-powered approach to diminish the effects of such issue. To put this idea to test we built a hybrid version of the Coupled Pattern Learner (CPL), a bootstrap learning algorithm that extract contextual patterns from an unstructured text, and SSCrowd, a component that allows conversation between learning systems and Web users, in an attempt to actively and autonomously look for human supervision by asking people to take part into the knowledge acquisition process, thus using the intelligence of the crowd to improve the learning capabilities of CPL. We take advantage of the ease that humans have to understand language in unstructured text, and we show the results of using a hybrid crowd-powered approach to diminish the effects of semantic drift.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    wikipedia.org.

  2. 2.

    mturk.com.

  3. 3.

    https://spacy.io/.

References

  1. Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Magazine 35(4), 105–120 (2014)

    Article  Google Scholar 

  2. Balcan, M.-F., Urner, R.: Active learning-modern learning theory. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 8–13. Springer, New York (2016)

    Chapter  Google Scholar 

  3. Bernstein, M.S.: Crowd-powered systems. KI-Künstliche Intelligenz 27(1), 69–73 (2013)

    Article  MathSciNet  Google Scholar 

  4. Bernstein, M.S., et al.: Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 313–322. ACM (2010)

    Google Scholar 

  5. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)

    Google Scholar 

  6. Bradeško, L., Starc, J., Mladenic, D., Grobelnik, M., Witbrock, M.: Curious cat conversational crowd based and context aware knowledge acquisition chat bot. In: 2016 IEEE 8th International Conference on Intelligent Systems (IS), pp. 239–252. IEEE (2016)

    Google Scholar 

  7. Brew, A., Greene, D., Cunningham, P.: Using crowdsourcing and active learning to track sentiment in online media. In: ECAI, pp. 145–150 (2010)

    Google Scholar 

  8. Callan, J., Hoy, M., Yoo, C., Zhao, L.: Clueweb09 data set (2009)

    Google Scholar 

  9. Carlson, A.: Coupled semi-supervised learning. Tech. rep., Machine Learning Department, Carnegie Mellon University (2010)

    Google Scholar 

  10. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI, vol. 5, p. 3 (2010)

    Google Scholar 

  11. Curran, J.R., Murphy, T., Scholz, B.: Minimising semantic drift with mutual exclusion bootstrapping. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, vol. 6, pp. 172–180. Citeseer (2007)

    Google Scholar 

  12. Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 467–474. International Foundation for Autonomous Agents and Multiagent Systems (2012)

    Google Scholar 

  13. Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, pp. 1953–1961 (2011)

    Google Scholar 

  14. Lasecki, W.S., Wesley, R., Nichols, J., Kulkarni, A., Allen, J.F., Bigham, J.P.: Chorus: a crowd-powered conversational assistant. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, pp. 151–162. ACM (2013)

    Google Scholar 

  15. Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)

    Article  Google Scholar 

  16. McIntosh , T., Curran, J.R.: Reducing semantic drift with bagging and distributional similarity. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 396–404 (2009)

    Google Scholar 

  17. Pedro, S.D.S., Appel, A.P., Hruschka Jr, E.R.: Autonomously reviewing and validating the knowledge base of a never-ending learning system. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1195–1204. ACM (2013)

    Google Scholar 

  18. Pedro, S.D.S., Hruschka, E.R.: Conversing learning: active learning and active social interaction for human supervision in never-ending learning systems. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS (LNAI), vol. 7637, pp. 231–240. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34654-5_24

    Chapter  Google Scholar 

  19. Pedro, S.D.S., Hruschka Jr, E.R.: Collective intelligence as a source for machine learning self-supervision. In: Proceedings of the 4th International Workshop on Web Intelligence & Communities in conjunction with WWW 2012, p. 5. ACM (2012)

    Google Scholar 

  20. Riloff, E., Jones, R., et al.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)

    Google Scholar 

  21. Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66), 11 (2010)

    Google Scholar 

  22. Sun, C., Rampalli, N., Yang, F., Doan, A.H.: Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endowment 7(13), 1529–1540 (2014)

    Article  Google Scholar 

  23. Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)

    Article  MathSciNet  Google Scholar 

  24. Yangarber, R.: Counter-training in discovery of semantic patterns. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 343–350. Association for Computational Linguistics (2003)

    Google Scholar 

  25. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics (1995)

    Google Scholar 

  26. Zaidan, O.F., Burch, C.C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1220–1229. Association for Computational Linguistics (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saulo D. S. Pedro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pedro, S.D.S., Hruschka, E.R. (2019). Crowd-Powered Systems to Diminish the Effects of Semantic Drift. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_59

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29859-3_59

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29858-6

  • Online ISBN: 978-3-030-29859-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics