Crowd-Powered Systems to Diminish the Effects of Semantic Drift

Pedro, Saulo D. S.; Hruschka, Estevam R.

doi:10.1007/978-3-030-29859-3_59

Crowd-Powered Systems to Diminish the Effects of Semantic Drift

Saulo D. S. Pedro¹³ &
Estevam R. Hruschka Jr.¹³

Conference paper
First Online: 26 August 2019

1310 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11734))

Abstract

Internet and social Web made possible the acquisition of information to feed a growing number of Machine Learning (ML) applications and, in addition, brought light to the use of crowdsourcing approaches, commonly applied to problems that are easy for humans but difficult for computers to solve, building the crowd-powered systems. In this work, we consider the issue of semantic drift in a bootstrap learning algorithm and propose the novel idea of a crowd-powered approach to diminish the effects of such issue. To put this idea to test we built a hybrid version of the Coupled Pattern Learner (CPL), a bootstrap learning algorithm that extract contextual patterns from an unstructured text, and SSCrowd, a component that allows conversation between learning systems and Web users, in an attempt to actively and autonomously look for human supervision by asking people to take part into the knowledge acquisition process, thus using the intelligence of the crowd to improve the learning capabilities of CPL. We take advantage of the ease that humans have to understand language in unstructured text, and we show the results of using a hybrid crowd-powered approach to diminish the effects of semantic drift.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Magazine 35(4), 105–120 (2014)
Article Google Scholar
Balcan, M.-F., Urner, R.: Active learning-modern learning theory. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 8–13. Springer, New York (2016)
Chapter Google Scholar
Bernstein, M.S.: Crowd-powered systems. KI-Künstliche Intelligenz 27(1), 69–73 (2013)
Article MathSciNet Google Scholar
Bernstein, M.S., et al.: Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 313–322. ACM (2010)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Google Scholar
Bradeško, L., Starc, J., Mladenic, D., Grobelnik, M., Witbrock, M.: Curious cat conversational crowd based and context aware knowledge acquisition chat bot. In: 2016 IEEE 8th International Conference on Intelligent Systems (IS), pp. 239–252. IEEE (2016)
Google Scholar
Brew, A., Greene, D., Cunningham, P.: Using crowdsourcing and active learning to track sentiment in online media. In: ECAI, pp. 145–150 (2010)
Google Scholar
Callan, J., Hoy, M., Yoo, C., Zhao, L.: Clueweb09 data set (2009)
Google Scholar
Carlson, A.: Coupled semi-supervised learning. Tech. rep., Machine Learning Department, Carnegie Mellon University (2010)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI, vol. 5, p. 3 (2010)
Google Scholar
Curran, J.R., Murphy, T., Scholz, B.: Minimising semantic drift with mutual exclusion bootstrapping. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, vol. 6, pp. 172–180. Citeseer (2007)
Google Scholar
Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 467–474. International Foundation for Autonomous Agents and Multiagent Systems (2012)
Google Scholar
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, pp. 1953–1961 (2011)
Google Scholar
Lasecki, W.S., Wesley, R., Nichols, J., Kulkarni, A., Allen, J.F., Bigham, J.P.: Chorus: a crowd-powered conversational assistant. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, pp. 151–162. ACM (2013)
Google Scholar
Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)
Article Google Scholar
McIntosh , T., Curran, J.R.: Reducing semantic drift with bagging and distributional similarity. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 396–404 (2009)
Google Scholar
Pedro, S.D.S., Appel, A.P., Hruschka Jr, E.R.: Autonomously reviewing and validating the knowledge base of a never-ending learning system. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1195–1204. ACM (2013)
Google Scholar
Pedro, S.D.S., Hruschka, E.R.: Conversing learning: active learning and active social interaction for human supervision in never-ending learning systems. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS (LNAI), vol. 7637, pp. 231–240. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34654-5_24
Chapter Google Scholar
Pedro, S.D.S., Hruschka Jr, E.R.: Collective intelligence as a source for machine learning self-supervision. In: Proceedings of the 4th International Workshop on Web Intelligence & Communities in conjunction with WWW 2012, p. 5. ACM (2012)
Google Scholar
Riloff, E., Jones, R., et al.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)
Google Scholar
Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66), 11 (2010)
Google Scholar
Sun, C., Rampalli, N., Yang, F., Doan, A.H.: Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endowment 7(13), 1529–1540 (2014)
Article Google Scholar
Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)
Article MathSciNet Google Scholar
Yangarber, R.: Counter-training in discovery of semantic patterns. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 343–350. Association for Computational Linguistics (2003)
Google Scholar
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics (1995)
Google Scholar
Zaidan, O.F., Burch, C.C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1220–1229. Association for Computational Linguistics (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Federal de São Carlos, UFSCar., São Carlos, Brazil
Saulo D. S. Pedro & Estevam R. Hruschka Jr.

Authors

Saulo D. S. Pedro
View author publications
You can also search for this author in PubMed Google Scholar
Estevam R. Hruschka Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saulo D. S. Pedro .

Editor information

Editors and Affiliations

University of León, León, Spain
Hilde Pérez García
University of León, León, Spain
Lidia Sánchez González
University of León, León, Spain
Manuel Castejón Limas
University of A Coruña, Ferrol, Spain
Héctor Quintián Pardo
University of Salamanca, Salamanca, Spain
Emilio Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pedro, S.D.S., Hruschka, E.R. (2019). Crowd-Powered Systems to Diminish the Effects of Semantic Drift. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_59

Download citation

DOI: https://doi.org/10.1007/978-3-030-29859-3_59
Published: 26 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics