skip to main content
10.1145/2783258.2783298acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

BatchRank: A Novel Batch Mode Active Learning Framework for Hierarchical Classification

Published: 10 August 2015 Publication History

Abstract

Active learning algorithms automatically identify the salient and exemplar instances from large amounts of unlabeled data and thus reduce human annotation effort in inducing a classification model. More recently, Batch Mode Active Learning (BMAL) techniques have been proposed, where a batch of data samples is selected simultaneously from an unlabeled set. Most active learning algorithms assume a flat label space, that is, they consider the class labels to be independent. However, in many applications, the set of class labels are organized in a hierarchical tree structure, with the leaf nodes as outputs and the internal nodes as clusters of outputs at multiple levels of granularity. In this paper, we propose a novel BMAL algorithm (BatchRank) for hierarchical classification. The sample selection is posed as an NP-hard integer quadratic programming problem and a convex relaxation (based on linear programming) is derived, whose solution is further improved by an iterative truncated power method. Finally, a deterministic bound is established on the quality of the solution. Our empirical results on several challenging, real-world datasets from multiple domains, corroborate the potential of the proposed framework for real-world hierarchical classification applications.

Supplementary Material

MP4 File (p99.mp4)

References

[1]
M. Balcan, S. Hanneke, and J. Vaughan. The true sample complexity of active learning. In Machine Learning, 2010.
[2]
K. Brinker. Incorporating diversity in active learning with support vector machines. ICML, 2003.
[3]
L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In CIKM, 2004.
[4]
Y. Cheng, Z. Chen, H. Fei, F. Wang, and A. Choudhary. Batch mode active learning with hierarchical-structured embedded variance. In SDM, 2014.
[5]
Y. Cheng, K. Zhang, Y. Xie, A. Agarwal, and A. Choudhary. On active learning in hierarchical classification. In CIKM, 2012.
[6]
O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In ICML, 2004.
[7]
I. Dimitrovski, D. Kocev, S. Loskovska, and S. Dzeroski. Hierchical annotation of medical images. In International Multiconference - Information Society IS, 2008.
[8]
S. Dumais and T. Chen. Hierarchical classification of web content. In Proceedings of SIGIR, 2000.
[9]
M. Goemans and D. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. In Journal of the ACM, 1995.
[10]
Y. Guo. Active instance sampling via matrix partition. In NIPS, 2010.
[11]
Y. Guo and D. Schuurmans. Discriminative batch mode active learning. In NIPS, 2007.
[12]
S. Hanneke. A bound on the label complexity of agnostic active learning. In ICML, 2007.
[13]
S. Hoi, R. Jin, and M. Lyu. Batch mode active learning with applications to text categorization and image retrieval. IEEE TKDE, 2009.
[14]
S. Hoi, R. Jin, J. Zhu, and M. Lyu. Semi-supervised SVM batch mode active learning for image retrieval. In CVPR, 2008.
[15]
S. C. H. Hoi, R. Jin, and M. R. Lyu. Large-scale text categorization by batch mode active learning. In WWW. ACM, 2006.
[16]
D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. In JMLR, 2004.
[17]
P. Li, T. Hastie, and K. Church. Very sparse random projections. In KDD, 2006.
[18]
X. Li, D. Kuang, and C. Ling. Active learning for hierarchical text classification. In PAKDD, 2012.
[19]
X. Li, C. Ling, and H. Wang. Effective top-down active learning for hierarchical text classification. In PAKDD, 2013.
[20]
J. Liu, S. Ji, and J. Ye. SLEP: Sparse learning with efficient projections. In Technical Report, Arizona State University, 2009.
[21]
J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. In JMLR, 2006.
[22]
G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In ICML, 2000.
[23]
B. Settles. Active learning literature survey. In Technical Report 1648, University of Wisconsin-Madison, 2010.
[24]
D. Shen, J. Zhang, J. Su, G. Zhou, and C. Tan. Multi-criteria-based active learning for named entity recognition. In ACL, 2004.
[25]
S. Vempala. The random projection method. In Americal Mathematical Society, 2004.
[26]
X. Yuan and T. Zhang. Truncated power method for sparse eigenvalue problems. In JMLR, 2013.
[27]
D. Zhou, L. Xiao, and M. Wu. Hierarchical classification via orthogonal transfer. In ICML, 2011.

Cited By

View all
  • (2021)iMatching: An interactive map-matching systemNeurocomputing10.1016/j.neucom.2020.04.155444(126-135)Online publication date: Jul-2021
  • (2020)Addressing the Item Cold-Start Problem by Attribute-Driven Active LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.289153032:4(631-644)Online publication date: 1-Apr-2020
  • (2020)Active learning for hierarchical multi-label classificationData Mining and Knowledge Discovery10.1007/s10618-020-00704-wOnline publication date: 17-Jul-2020
  • Show More Cited By

Index Terms

  1. BatchRank: A Novel Batch Mode Active Learning Framework for Hierarchical Classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. active learning
    2. hierarchical classification
    3. optimization

    Qualifiers

    • Research-article

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)iMatching: An interactive map-matching systemNeurocomputing10.1016/j.neucom.2020.04.155444(126-135)Online publication date: Jul-2021
    • (2020)Addressing the Item Cold-Start Problem by Attribute-Driven Active LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.289153032:4(631-644)Online publication date: 1-Apr-2020
    • (2020)Active learning for hierarchical multi-label classificationData Mining and Knowledge Discovery10.1007/s10618-020-00704-wOnline publication date: 17-Jul-2020
    • (2019)Context Aware Image Annotation in Active Learning with Batch Mode2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00157(952-953)Online publication date: Jul-2019
    • (2018)Cost-effective active learning for hierarchical multi-label classificationProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304889.3305072(2962-2968)Online publication date: 13-Jul-2018
    • (2018)Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced DataProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219948(2768-2777)Online publication date: 19-Jul-2018
    • (2016)Local-based active classification of test report to assist crowdsourced testingProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering10.1145/2970276.2970300(190-201)Online publication date: 25-Aug-2016
    • (2016)Proceedings of the 31st IEEE/ACM International Conference on Automated Software EngineeringundefinedOnline publication date: 25-Aug-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media