skip to main content
research-article

Evolutionary Strategy to Perform Batch-Mode Active Learning on Multi-Label Data

Published: 30 January 2018 Publication History

Abstract

Multi-label learning has become an important area of research owing to the increasing number of real-world problems that contain multi-label data. Data labeling is an expensive process that requires expert handling. The annotation of multi-label data is laborious since a human expert needs to consider the presence/absence of each possible label. Consequently, numerous modern multi-label problems may involve a small number of labeled examples and plentiful unlabeled examples simultaneously. Active learning methods allow us to induce better classifiers by selecting the most useful unlabeled data, thus considerably reducing the labeling effort and the cost of training an accurate model. Batch-mode active learning methods focus on selecting a set of unlabeled examples in each iteration in such a way that the selected examples are informative and as diverse as possible. This article presents a strategy to perform batch-mode active learning on multi-label data. The batch-mode active learning is formulated as a multi-objective problem, and it is solved by means of an evolutionary algorithm. Extensive experiments were conducted in a large collection of datasets, and the experimental results confirmed the effectiveness of our proposal for better batch-mode multi-label active learning.

Supplementary Material

a46-reyes-apndx.pdf (reyes.zip)
Supplemental movie, appendix, image and software files for, Evolutionary Strategy to Perform Batch-Mode Active Learning on Multi-Label Data

References

[1]
R. Agrawal, T. Imieliński, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the International Conference on Managing Data, Vol. 22. ACM, Washington, DC, USA, 207--216.
[2]
A. M. Anile, V. Cutello, G. Nicosia, R. Rascuna, and S. Spinella. 2005. Comparison among evolutionary algorithms and classical optimization methods for circuit design problems. In Proceedings of the Congress on Evolutionary Computing IEEE, 765--772.
[3]
K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. I. Jordan. 2003. Matching words and pictures. Journal of Machine Learning Research 3 (2003), 1107--1135.
[4]
S. Bechikh, R. Datta, and A. Gupta. 2017. Recent Advances in Evolutionary Multi-objective Optimization. Vol. 20. Springer.
[5]
B. Bergmann and G. Hommel. 1988. Multiple Hypotheses Testing. Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses, Springer Berlin, 100--115.
[6]
M. Boutell, J. Luo, X. Shen, and C. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757--1771.
[7]
K. Brinker. 2006. From Data and Information Analysis to Knowledge Engineering. Active Learning in Multi-label Classification, Springer-Verlag, 206--213.
[8]
S. Chakraborty, V. Balasubramanian, and S. Panchanathan. 2011. Optimal batch selection for active learning in multi-label classification. In Proceedings of the 19th International Conference on Multimedia. ACM, Scottsdale, Arizona, USA, 1413--1416.
[9]
R. Chattopadhyay, Z. Wang, W. Fan, D. Ian, P. Sethuraman, and Y. Jieping. 2013. Batch mode active sampling based on marginal probability distribution matching. ACM Transactions on Knowledge Discovery from Data 7, 3 (2013), 965--991.
[10]
A. Clare and R. King. 2001. Knowledge discovery in multi-label phenotype data. In Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery. Springer Berlin, 42--53.
[11]
C. Ye, J. Wu, V. Sheng, P. Zhao, and Z. Cui. 2015. Multi-label active learning with label correlation for image classification. In Proceedings of the International Conference on Image Processing. IEEE, 3437--3441.
[12]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computing 6, 2 (2002), 182--197.
[13]
J. Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (2006), 1--30.
[14]
S. Diplarisa, G. Tsoumakas, P. Mitkas, and I. Vlahavas. 2005. Protein classification with multiple algorithms. In Proceedings of the 10th Panhellenic Conference on Information. Springer Berlin, 448--456.
[15]
A. E. Eiben and S. K. Smit. 2012. Autonomous Search. Evolutionary Algorithm Parameters and Methods to Tune Them, Springer Berlin, 15--36.
[16]
A. Elisseeff and J. Weston. 2001. A kernel method for multi-labelled classification. In Advanced Neural Information Processing Systems, Vol. 14. MIT Press, 681--687.
[17]
A. Esuli and F. Sebastiani. 2009. Active learning strategies for multi-label text classification. In Proceedings of the 31st European Conference on IR Research (LNCS), Vol. 5478. Springer, Toulouse, France, 102--113.
[18]
M. Friedman. 1940. A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics 11 (1940), 86--92.
[19]
Y. Fu, X. Zhu, and A. K. Elmagarmid. 2013a. Active learning with optimal instance subset selection. IEEE Transactions on Cybernetics 43, 2 (2013), 464--475.
[20]
Y. Fu, X. Zhu, and B. Li. 2013b. A survey on instance selection for active learning. Knowledge and Information Systems 35 (2013), 249--283.
[21]
N. Gao, S. J. Huang, and S. Chen. 2016. Multi-label active learning by model guided distribution matching. Frontiers of Computer Science (CHI’10), 5 (2016), 845--855.
[22]
S. García and F. Herrera. 2008. An extension on “Statistical comparisons of classifiers over multiple data sets’’ for all pairwise comparisons. Journal of Machine Learning Research 9 (2008), 2677--2694.
[23]
E. Gibaja and S. Ventura. 2014. Multi-label learning: A review of the state of the art and ongoing research. WIRES Data Mining Knowledge 4 (2014), 411--444.
[24]
J. J. Grefenstette. 1986. Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics 16, 1 (1986), 122--128.
[25]
Y. Guo and D. Schuurmans. 2008. Discriminative batch mode active learning. In Proceedings of Advanced Neural Information Processing Systems. MIT Press, 593--600.
[26]
S. Huang, S. Chen, and Z. Zhou. 2015. Multi-label active learning: Query type matters. In Proceedings of 24th International Conference on Artificial Intelligence. AAI Press, 946--952.
[27]
S. Huang, R. Jin, and Z. Zhou. 2014. Active learning by querying informative and representative examples. IEEE Transactions on Pattern Analysis 36, 10 (2014), 1936--1949.
[28]
S. J. Huang and Z. H. Zhou. 2013. Active query driven by uncertainty and diversity for incremental multi-label learning. In Proceedings of the 13th International Conference on Data Mining. IEEE, 1079--1084.
[29]
C. W. Hung and H. T. Lin. 2011. Multi-label active learning with auxiliary learner. In Proceedings of the Asian Conference on Machine Learning. JMLR, 315--330.
[30]
I. Katakis, G. Tsoumakas, and I. Vlahavas. 2008. Multilabel text classification for automated tag suggestion. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) Discovery Challenge.
[31]
D. Lewis and W. Gale. 1994. A sequential algorithm for training text classifier. In Proceedings of the 17th Annual International Conference on Developing Information Retrieval. ACM, 3--12.
[32]
S. Y. Li, Y. Jiang, and Z. H. Zhou. 2015. Multi-label active learning from crowds. arXiv preprint arXiv:1508.00722. (2015).
[33]
T. Li and M. Ogihara. 2003. Detecting emotion in music. In Proceedings of the International Symposium on Music Information Retrieval. Washington DC, USA, 239--240.
[34]
X. Li and Y. Guo. 2013. Active learning with multi-label SVM classification. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence AAAI Press, 1479--1485.
[35]
X. Li, L. Wang, and E. Sung. 2004. Multi-label SVM active learning for image classification. In Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining. Vol. 4. ACM, 2207--2210.
[36]
Y. Liu and Z. Zhang. 2008. A fast algorithm for linearly constrained quadratic programming problems with lower and upper bounds. In Proceedings of the International Conference on Multimedia and Information Technology. IEEE, 58--61.
[37]
J. M. Luna, A. Cano, M. Pechenizkiy, and S. Ventura. 2016. Speeding-up association rule mining with inverted index compression. IEEE Transactions on Cybernetics 99 (2016), 1--14.
[38]
J. M. Luna and S. Ventura. 2016. Pattern Mining with Evolutionary Algorithms. Springer.
[39]
J. P. Pestian, C. Brew, P. Matykiewicz, D. J. Hovermale, N. Johnson, K. B. Cohen, and W. Duch. 2007. A shared task involving multi-label classification of clinical free text. In Proceedings of the Workshop on Biological, Translational, and Clinical Language Processing (BioNLP’07). Association for Computational Linguistics, Stroudsburg, PA, USA, 97--104.
[40]
G. J. Qi, X. S. Hua, Y. Rui, J. Tang, and H. J. Zhang. 2009. Two-dimensional multi-label active learning with an efficient online adaptation model for image classification. IEEE Transactions on Pattern Analysis 99, 1 (2009).
[41]
C. Qian, Y. Yang, and Z. H. Zhou. 2015a. Subset selection by pareto optimization. In Advanced Neural Information Processing Systems. Montreal, Canada, 1765--1773.
[42]
C. Qian, Y. Yu, and Z. H. Zhou. 2015b. Pareto ensemble pruning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence Austin, TX, 2935--2941.
[43]
J. Read, A. Bifet, G. Holmes, and B. Pfahringer. 2012. Scalable and efficient multi-label classification for evolving data streams. Machine Learning 88 (2012), 243--272.
[44]
O. Reyes, C. Morell, and S. Ventura. 2017. Effective active learning strategy for multi-label learning. Neurocomputing (2017).
[45]
O. Reyes, E. Pérez, M. C. Rodríguez-Hernández, H. M. Fardoun, and S. Ventura. 2016. JCLAL: A Java framework for active learning. Journal of Machine Learning Research 17, 95 (2016), 1--5.
[46]
B. Settles. 2012. Active Learning (1 ed.). Morgan 8 Claypool.
[47]
B. Settles and M. Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL Press, 1069--1078.
[48]
C. Shi, X. Kong, D. Fu, P. S. Yu, and B. Wu. 2014. Multi-label classification based on multi-objective optimization. ACM Transactions on Intelligent Systems Technology 5, 2 (2014), 35:1--35:22.
[49]
L. Shi, Y. Zhao, and J. Tang. 2012. Batch mode active learning for networked data. ACM Transactions on Intelligent Systems Technology 3, 2 (2012).
[50]
P. Shukla, K. Deb, and S. Tiwari. 2005. Comparing classical generating methods with an evolutionary multi-objective optimization method. In Proceedings of the International Conference on Evolutionary Multi-Criterion Optimization (LNCS), Vol. 3410. Springer, Berlin Heidelberg, 311--325.
[51]
M. Singh, E. Curran, and P. Cunningham. 2009. Active learning for multi-label image annotation. In Proceedings of the 19th Irish Conference on Artificial Intelligence and Cognitive Science. 173--182.
[52]
E. Spyromitros, G. Tsoumakas, and I. Vlahavas. 2008. An empirical study of lazy multi-label classification algorithms. In Proceedings of the Hellenic Conference on Artificial Intelligence (LNAI), Vol. 5138. Springer Berlin, 401--406.
[53]
J. Tang, Z. Zha, D. Tao, and T. S. Chua. 2012. Semantic-gap-oriented active learning for multilabel image annotation. IEEE Transactions on Image Processing 21, 4 (2012), 2354--2360.
[54]
G. Tsoumakas, I. Katakis, and I. Vlahavas. 2010. Data Mining and Knowledge Discovery Handbook (2 ed.). Mining Multi-label Data, Springer-Verlag, New York, 667--686.
[55]
D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. 2008. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio Speech 16, 2 (2008), 467--476.
[56]
D. Vasisht and A. Damianou. 2014. Active learning for sparse bayesian multilabel classification. In Proceedings of the 20th ACM International Conference on Knowledge Discovery and Data Mining. 472--481.
[57]
S. Ventura, C. Romero, A. Zafra, J. A. Delgado, and C. Hervás. 2008. JCLEC: A java framework for evolutionary computation. Soft Computing 12 (2008), 381--392.
[58]
S. Vijayanarasimhan and K. Grauman. 2009. What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In Proceedings of the Conference on Computer Vision Pattern Recognition. IEEE, 2262--2269.
[59]
P. Wang, P. Zhang, and L. Guo. 2012. Mining multi-label data streams using ensemble-based active learning. In Proceedings of the 12th SIAM International Conference on Data Mining, 1131--1140.
[60]
D. Wilson and T. R. Martínez. 1997. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research 6 (1997), 1--34.
[61]
J. Wu, V. Sheng, J. Zhang, P. Zhao, and Z. Cui. 2014. Multi-label active learning for image classification. In Proceedings of the International Conference on Image Processing. IEEE, 5227--5231.
[62]
J. Wu, S. Zhao, V. S. Sheng, J. Zhang, C. Ye, P. Zhao, and Z. Cui. 2017. Weak labeled active learning with conditional label dependence for multi-label image classification. IEEE Transactions on Multimedia 19, 6 (2017), 1156--1169.
[63]
Z. Xu, R. Akella, and Y. Zhang. 2007. Incorporating diversity and density in active learning for relevance feedback. In Proceedings of the European Conference on Information Retrieval. Springer, Berlin Heidelberg, 246--257.
[64]
B. Yang, J. T. Sun, T. Wang, and Z. Chen. 2009. Effective multi-label active learning for text classification. In Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining. ACM, Paris, France, 917--926.
[65]
B. Zhang, Y. Wang, and F. Chen. 2014a. Multilabel image classification via high-order label correlation driven active learning. IEEE Transactions on Image Processing 23, 3 (2014), 1430--144.
[66]
J. Zhang, X. Wu, and V. S. Sheng. 2014b. Active learning with imbalanced multiple noisy labeling. IEEE Transactions on Cybernetics 44, 3 (2014).
[67]
M. L. Zhang and Z. H. Zhou. 2014. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering 26, 8 (2014), 1819--1837.
[68]
X. Zhang, J. Cheng, C. Xu, H. Lu, and S. Ma. 2009. Multi-view multi-label active learning for image classification. In Proceedings of the International Conference on Multimedia. Expo. IEEE, 258--261.
[69]
Y. Zhang. 2010. Multi-task active learning with output constraints. In Proceedings of the 24th AAAI Conference on Artificial Intelligence AAAI Press, 667--672.
[70]
S. Zhao, J. Wu, V. S. Sheng, C. Ye, P. Zhao, and Z. Cui. 2015. Weak labeled multi-label active learning for image classification. In Proceedings of the 23th International Conference on Multimedia. ACM, 1127--1130.
[71]
E. Zitzler. 1999. Evolutionary Algorithms for Multi-objective Optimization: Methods and Applications. PhD. thesis. Swiss Federal Institute of Technology, Zurich.

Cited By

View all
  • (2024)An Unbiased Risk Estimator for Partial Label Learning with Augmented ClassesACM Transactions on Intelligent Systems and Technology10.1145/370013715:6(1-22)Online publication date: 14-Oct-2024
  • (2024)Instance-Label Based Multi-Label Active Learning by Evolutionary Multi-Objective OptimizationProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654388(327-330)Online publication date: 14-Jul-2024
  • (2024)Efficient post-earthquake reconnaissance planning using adaptive batch-mode active learningAdvanced Engineering Informatics10.1016/j.aei.2024.10241460(102414)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 9, Issue 4
Research Survey and Regular Papers
July 2018
280 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3183892
  • Editor:
  • Yu Zheng
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2018
Accepted: 01 November 2017
Revised: 01 October 2017
Received: 01 September 2017
Published in TIST Volume 9, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Batch-mode active learning
  2. evolutionary algorithm
  3. multi-label learning
  4. multi-objective problem

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Unbiased Risk Estimator for Partial Label Learning with Augmented ClassesACM Transactions on Intelligent Systems and Technology10.1145/370013715:6(1-22)Online publication date: 14-Oct-2024
  • (2024)Instance-Label Based Multi-Label Active Learning by Evolutionary Multi-Objective OptimizationProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654388(327-330)Online publication date: 14-Jul-2024
  • (2024)Efficient post-earthquake reconnaissance planning using adaptive batch-mode active learningAdvanced Engineering Informatics10.1016/j.aei.2024.10241460(102414)Online publication date: Apr-2024
  • (2023)EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry SystemProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613897(1973-1984)Online publication date: 30-Nov-2023
  • (2023)Active Learning With Complementary Sampling for Instructing Class-Biased Multi-Label Text Emotion ClassificationIEEE Transactions on Affective Computing10.1109/TAFFC.2020.303840114:1(523-536)Online publication date: 1-Jan-2023
  • (2023)Extending version-space theory to multi-label active learning with imbalanced dataPattern Recognition10.1016/j.patcog.2023.109690142(109690)Online publication date: Oct-2023
  • (2023)PLVI-CE: a multi-label active learning algorithm with simultaneously considering uncertainty and diversityApplied Intelligence10.1007/s10489-023-05008-253:22(27844-27864)Online publication date: 19-Sep-2023
  • (2023)MCVIE: An Effective Batch-Mode Active Learning for Multi-label Text ClassificationNatural Language Processing and Chinese Computing10.1007/978-3-031-44693-1_27(337-348)Online publication date: 12-Oct-2023
  • (2022)Granular Multilabel Batch Active Learning With Pairwise Label CorrelationIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2021.306271452:5(3079-3091)Online publication date: May-2022
  • (2022)CMAL: Cost-Effective Multi-Label Active Learning by Querying SubexamplesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.300389934:5(2091-2105)Online publication date: 1-May-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media