Article

Large margin hierarchical classification

Authors:

Yoram SingerAuthors Info & Claims

ICML '04: Proceedings of the twenty-first international conference on Machine learning

Page 27

https://doi.org/10.1145/1015330.1015374

Published: 04 July 2004 Publication History

Abstract

We present an algorithmic framework for supervised classification learning where the set of labels is organized in a predefined hierarchical structure. This structure is encoded by a rooted tree which induces a metric over the label set. Our approach combines ideas from large margin kernel methods and Bayesian analysis. Following the large margin principle, we associate a prototype with each label in the tree and formulate the learning task as an optimization problem with varying margin constraints. In the spirit of Bayesian methods, we impose similarity requirements between the prototypes corresponding to adjacent labels in the hierarchy. We describe new online and batch algorithms for solving the constrained optimization problem. We derive a worst case loss-bound for the online algorithm and provide generalization analysis for its batch counterpart. We demonstrate the merits of our approach with a series of experiments on synthetic, text and speech data.

References

[1]

Censor, Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms, and applications. Oxford University Press, New York, NY, USA.]]

Digital Library

[2]

Cesa-Bianchi, N., Conconi, A., & C. Gentile (2004). On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory. (to appear).]]

Digital Library

[3]

Crammer, K., Dekel, O., Shalev-Shwartz, S., & Singer, Y. (2003). Online passive aggressive algorithms. Advances in Neural Information Processing Systems 16.]]

[4]

Deller, J., Proakis, J., & Hansen, J. (1987). Discrete-time processing of speech signals. Prentice-Hall.]]

Digital Library

[5]

Dumais, S. T., & Chen, H. (2000). Hierarchical classification of Web content. Proceedings of SIGIR-00 (pp. 256--263).]]

Digital Library

[6]

ETSI (2000). ETSI Standard, ETSI ES 201 108.]]

[7]

Herbster, M. (2001). Learning additive models online with fast evaluating kernels. Proceedings of the Fourteenth Annual Conference on Computational Learning Theory (pp. 444--460).]]

Digital Library

[8]

Katz, S. (1987). Estimation of probabilities from sparse-data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing (ASSP), 35, 400--40.]]

[9]

Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64.]]

Digital Library

[10]

Koller, D., & Sahami, M. (1997). Hierarchically classifying docuemnts using very few words. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 171--178).]]

Digital Library

[11]

Lemel, L., Kassel, R., & Seneff, S. (1986). Speech database development: Design and analysis Report no. SAIC-86/1546). Proc. DARPA Speech Recognition Workshop.]]

[12]

McCallum, A. K., Rosenfeld, R., Mitchell, T. M., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. Proceedings of ICML-98 (pp. 359--367).]]

Digital Library

[13]

Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice-Hall.]]

[14]

Vapnik, V. N. (1998). Statistical learning theory. Wiley.]]

Digital Library

[15]

Weigend, A. S., Wiener, E. D., & Pedersen, J. O. (1999). Exploiting hierarchy in text categorization. Information Retrieval, 1, 193--216.]]

Digital Library

[16]

Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. Proceedings of the Seventh European Symposium on Artificial Neural Networks.]]

Cited By

Wang CLiu WGuo LLin Y(2025)Hierarchical feature selection via joint local label enhancement and neighborhood label distribution correlationKnowledge-Based Systems10.1016/j.knosys.2025.113123311(113123)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113123
Du JDing JWu YChen TLian JShi LZhou Y(2024)A Pathological Diagnosis Method for Fever of Unknown Origin Based on Multipath Hierarchical Classification: Model Design and ValidationJMIR Formative Research10.2196/584238(e58423-e58423)Online publication date: 9-Dec-2024
https://doi.org/10.2196/58423
Li LWang WZhou TQuan RYang Y(2024)Semantic Hierarchy-Aware SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333243546:4(2123-2138)Online publication date: Apr-2024
https://doi.org/10.1109/TPAMI.2023.3332435
Show More Cited By

Large margin hierarchical classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

Semi-supervised multi-label classification a simultaneous large-margin, subspace learning approach
ECMLPKDD'12: Proceedings of the 2012th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Labeled data is often sparse in common learning scenarios, either because it is too time consuming or too expensive to obtain, while unlabeled data is almost always plentiful. This asymmetry is exacerbated in multi-label learning, where the labeling ...
Convex large margin training techniques: unsupervised, semi-supervised, and robust support vector machines
Hierarchical learning of large-margin metrics for large-scale image classification

Large-scale image classification is a challenging task and has recently attracted active research interests. In this paper, a new algorithm is developed to achieve more effective implementation of large-scale image classification by hierarchical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '04: Proceedings of the twenty-first international conference on Machine learning

July 2004

934 pages

ISBN:1581138385

DOI:10.1145/1015330

Conference Chair:
Carla Brodley
Purdue University/Tufts University

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

148
Total Citations
View Citations
943
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)7

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang CLiu WGuo LLin Y(2025)Hierarchical feature selection via joint local label enhancement and neighborhood label distribution correlationKnowledge-Based Systems10.1016/j.knosys.2025.113123311(113123)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113123
Du JDing JWu YChen TLian JShi LZhou Y(2024)A Pathological Diagnosis Method for Fever of Unknown Origin Based on Multipath Hierarchical Classification: Model Design and ValidationJMIR Formative Research10.2196/584238(e58423-e58423)Online publication date: 9-Dec-2024
https://doi.org/10.2196/58423
Li LWang WZhou TQuan RYang Y(2024)Semantic Hierarchy-Aware SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333243546:4(2123-2138)Online publication date: Apr-2024
https://doi.org/10.1109/TPAMI.2023.3332435
Shen YYan YYin DShen H(2024)BCC: Bidirectional Consistency Constraint Method for Hierarchical Text ClassificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447200(11271-11275)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10447200
Zhang Z(2024)Hierarchical Feature Selection Algorithm Combined with Category Information Constraints2024 Cross Strait Radio Science and Wireless Technology Conference (CSRSWTC)10.1109/CSRSWTC64338.2024.10811508(1-3)Online publication date: 4-Nov-2024
https://doi.org/10.1109/CSRSWTC64338.2024.10811508
Zhang YShi JZhao H(2024)DMTFS-FO: Dynamic multi-task feature selection based on flexible loss and orthogonal constraintExpert Systems with Applications10.1016/j.eswa.2024.124588255(124588)Online publication date: Dec-2024
https://doi.org/10.1016/j.eswa.2024.124588
Shu TLin YGuo L(2024)Online hierarchical streaming feature selection based on adaptive neighborhood rough setApplied Soft Computing10.1016/j.asoc.2024.111276(111276)Online publication date: Jan-2024
https://doi.org/10.1016/j.asoc.2024.111276
Tian YShe Y(2024)Uncertainty Measure-Based Incremental Feature Selection For Hierarchical ClassificationInternational Journal of Fuzzy Systems10.1007/s40815-024-01708-026:6(2074-2096)Online publication date: 18-May-2024
https://doi.org/10.1007/s40815-024-01708-0
Tian YShe Y(2024)Incremental feature selection for large-scale hierarchical classification with the arrival of new samplesApplied Intelligence10.1007/s10489-024-05352-x54:5(3933-3953)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s10489-024-05352-x
Xiao JZhou ZLi WLan SMei JYu ZZhao BYuille AZhou YXie C(2024)A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive PropertiesComputer Vision – ECCV 202410.1007/978-3-031-72920-1_14(239-258)Online publication date: 1-Oct-2024
https://doi.org/10.1007/978-3-031-72920-1_14
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten