Active Learning with Crowdsourcing for the Cold Start of Imbalanced Classifiers

Brangbour, Etienne; Bruneau, Pierrick; Tamisier, Thomas; Marchand-Maillet, Stéphane

doi:10.1007/978-3-030-60816-3_22

Etienne Brangbour^9,10,
Pierrick Bruneau⁹,
Thomas Tamisier⁹ &
…
Stéphane Marchand-Maillet¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12341))

Included in the following conference series:

International Conference on Cooperative Design, Visualization and Engineering

1022 Accesses
2 Citations

Abstract

We present a novel cooperative strategy based on active learning and crowdsourcing, dedicated to provide a solution to the cold start stage, i.e. initializing the classification of a large set of data with no attached labels. The strategy is moreover designed to handle an imbalanced context in which random selection is highly inefficient. In this purpose, our method is guided by an unsupervised clustering, and the computation of cluster quality and impurity indexes, updated at each active learning step. The strategy is explained on a case study of annotating Twitter content w.r.t. a real flood event. We also show that our technique can cope with multiple heterogeneous data representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Almeida, H., Guedes, D., Meira, W., Zaki, M.J.: Is there a best quality metric for graph clusters? In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6911, pp. 44–59. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23780-5_13
Chapter Google Scholar
Anthony, B., Derek, G., Pádraig, C.: Using crowdsourcing and active learning to track sentiment in online media, pp. 145–150. https://doi.org/10.3233/978-1-60750-606-5-14. http://www.medra.org/servlet/aliasResolver?alias=iospressISSNISBN&issn=0922-6389&volume=215&spage=145
Bishop, C.: Pattern Recognition and Machine Learning. Information Science and Statistics, 1st edn. pp. xx, 738. Springer, New York (2006)
Google Scholar
Brangbour, E., et al.: Computing flood probabilities using Twitter: application to the Houston urban area during Harvey. In: 9th International Workshop on Climate Informatics (2019)
Google Scholar
Brangbour, E., et al.: Extracting localized information from a Twitter corpus for flood prevention. arXiv:1903.04748 (2019)
Bruneau, P., Otjacques, B.: A probabilistic model selection criterion for spectral clustering. Intell. Data Anal. 22(5), 1059–1077 (2018)
Article Google Scholar
Bruneau, P., Tamisier, T.: Transfer learning and mixed input deep neural networks for estimating flood severity in news content. In: MediaEval Multimedia Evaluation Workshop (2019)
Google Scholar
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)
MathSciNet MATH Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article MATH Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI, 1(2), 224–227 (1979)
Google Scholar
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2Vec: Character-Based Distributed Representations for Social Media. arXiv:1605.03481 (2016)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Emmons, S., Kobourov, S., Gallant, M., Börner, K.: Analysis of network clustering algorithms and cluster quality metrics at scale. PLOS One 11(7), e0159161 (2016)
Article Google Scholar
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)
Article MATH Google Scholar
Frey, B., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)
Article MathSciNet MATH Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, pp. 249–256 (2010)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article MATH Google Scholar
Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Kang, J., Ryu, K.R., Kwon, H.-C.: Using Cluster-Based Sampling to Select Initial Training Set for Active Learning in Text Classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 384–388. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24775-3_46
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Sys. 25, 1097–1105 (2012)
Google Scholar
Matgen, P., et al.: Integrating Data Streams from in-situ Measurements, Social Networks and Satellite Earth Observation to Augment Operational Flood Monitoring and Forecasting: the 2017 Hurricane Season in the Americas as a Large-scale Test Case. AGU Fall Meeting Abstracts 31 (2017)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 (2013)
Nowak, S., Rüger, S.: How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the International Conference on Multimedia Information Retrieval - MIR’10, p. 557. ACM Press. https://doi.org/10.1145/1743384.1743478. http://portal.acm.org/citation.cfm?doid=1743384.1743478
Rosenberg, A., Hirschberg, J.: V-Measure: a conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 410–420 (2007)
Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Scholkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA (2001)
Google Scholar
Settles, B.: Active learning literature survey. Technical report. University of Wisconsin-Madison Department of Computer Sciences (2009). https://minds.wisconsin.edu/handle/1793/60660
Vidal, R.: Subspace clustering. IEEE Sig. Proc. Mag. 28(2), 52–68 (2011)
Article Google Scholar
Vinh, N., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11(95), 2837–2854 (2010)
MathSciNet MATH Google Scholar
Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Advances in Neural Information Processing Systems 17, pp. 1601–1608. MIT Press (2005)
Google Scholar
Zhu, X., Gharamani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report. CMU-CALD-02-107, Carnegie Mellon University (2002)
Google Scholar

Download references

Acknowledgements

This work was performed in the context of the Publimape project, funded by the CORE programme of the Luxembourgish National Research Fund (FNR).

Author information

Authors and Affiliations

Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Etienne Brangbour, Pierrick Bruneau & Thomas Tamisier
University of Geneva, Geneva, Switzerland
Etienne Brangbour & Stéphane Marchand-Maillet

Authors

Etienne Brangbour
View author publications
You can also search for this author in PubMed Google Scholar
Pierrick Bruneau
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Tamisier
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Marchand-Maillet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Etienne Brangbour .

Editor information

Editors and Affiliations

University of the Balearic Islands, Palma, Mallorca, Spain
Yuhua Luo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brangbour, E., Bruneau, P., Tamisier, T., Marchand-Maillet, S. (2020). Active Learning with Crowdsourcing for the Cold Start of Imbalanced Classifiers. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2020. Lecture Notes in Computer Science(), vol 12341. Springer, Cham. https://doi.org/10.1007/978-3-030-60816-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-60816-3_22
Published: 16 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60815-6
Online ISBN: 978-3-030-60816-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics