skip to main content
10.1145/1015330.1015374acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Large margin hierarchical classification

Published: 04 July 2004 Publication History

Abstract

We present an algorithmic framework for supervised classification learning where the set of labels is organized in a predefined hierarchical structure. This structure is encoded by a rooted tree which induces a metric over the label set. Our approach combines ideas from large margin kernel methods and Bayesian analysis. Following the large margin principle, we associate a prototype with each label in the tree and formulate the learning task as an optimization problem with varying margin constraints. In the spirit of Bayesian methods, we impose similarity requirements between the prototypes corresponding to adjacent labels in the hierarchy. We describe new online and batch algorithms for solving the constrained optimization problem. We derive a worst case loss-bound for the online algorithm and provide generalization analysis for its batch counterpart. We demonstrate the merits of our approach with a series of experiments on synthetic, text and speech data.

References

[1]
Censor, Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms, and applications. Oxford University Press, New York, NY, USA.]]
[2]
Cesa-Bianchi, N., Conconi, A., & C. Gentile (2004). On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory. (to appear).]]
[3]
Crammer, K., Dekel, O., Shalev-Shwartz, S., & Singer, Y. (2003). Online passive aggressive algorithms. Advances in Neural Information Processing Systems 16.]]
[4]
Deller, J., Proakis, J., & Hansen, J. (1987). Discrete-time processing of speech signals. Prentice-Hall.]]
[5]
Dumais, S. T., & Chen, H. (2000). Hierarchical classification of Web content. Proceedings of SIGIR-00 (pp. 256--263).]]
[6]
ETSI (2000). ETSI Standard, ETSI ES 201 108.]]
[7]
Herbster, M. (2001). Learning additive models online with fast evaluating kernels. Proceedings of the Fourteenth Annual Conference on Computational Learning Theory (pp. 444--460).]]
[8]
Katz, S. (1987). Estimation of probabilities from sparse-data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing (ASSP), 35, 400--40.]]
[9]
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1--64.]]
[10]
Koller, D., & Sahami, M. (1997). Hierarchically classifying docuemnts using very few words. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 171--178).]]
[11]
Lemel, L., Kassel, R., & Seneff, S. (1986). Speech database development: Design and analysis Report no. SAIC-86/1546). Proc. DARPA Speech Recognition Workshop.]]
[12]
McCallum, A. K., Rosenfeld, R., Mitchell, T. M., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. Proceedings of ICML-98 (pp. 359--367).]]
[13]
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Prentice-Hall.]]
[14]
Vapnik, V. N. (1998). Statistical learning theory. Wiley.]]
[15]
Weigend, A. S., Wiener, E. D., & Pedersen, J. O. (1999). Exploiting hierarchy in text categorization. Information Retrieval, 1, 193--216.]]
[16]
Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. Proceedings of the Seventh European Symposium on Artificial Neural Networks.]]

Cited By

View all
  • (2025)Hierarchical feature selection via joint local label enhancement and neighborhood label distribution correlationKnowledge-Based Systems10.1016/j.knosys.2025.113123311(113123)Online publication date: Feb-2025
  • (2024)A Pathological Diagnosis Method for Fever of Unknown Origin Based on Multipath Hierarchical Classification: Model Design and ValidationJMIR Formative Research10.2196/584238(e58423-e58423)Online publication date: 9-Dec-2024
  • (2024)Semantic Hierarchy-Aware SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333243546:4(2123-2138)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '04: Proceedings of the twenty-first international conference on Machine learning
July 2004
934 pages
ISBN:1581138385
DOI:10.1145/1015330
  • Conference Chair:
  • Carla Brodley
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)7
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Hierarchical feature selection via joint local label enhancement and neighborhood label distribution correlationKnowledge-Based Systems10.1016/j.knosys.2025.113123311(113123)Online publication date: Feb-2025
  • (2024)A Pathological Diagnosis Method for Fever of Unknown Origin Based on Multipath Hierarchical Classification: Model Design and ValidationJMIR Formative Research10.2196/584238(e58423-e58423)Online publication date: 9-Dec-2024
  • (2024)Semantic Hierarchy-Aware SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333243546:4(2123-2138)Online publication date: Apr-2024
  • (2024)BCC: Bidirectional Consistency Constraint Method for Hierarchical Text ClassificationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447200(11271-11275)Online publication date: 14-Apr-2024
  • (2024)Hierarchical Feature Selection Algorithm Combined with Category Information Constraints2024 Cross Strait Radio Science and Wireless Technology Conference (CSRSWTC)10.1109/CSRSWTC64338.2024.10811508(1-3)Online publication date: 4-Nov-2024
  • (2024)DMTFS-FO: Dynamic multi-task feature selection based on flexible loss and orthogonal constraintExpert Systems with Applications10.1016/j.eswa.2024.124588255(124588)Online publication date: Dec-2024
  • (2024)Online hierarchical streaming feature selection based on adaptive neighborhood rough setApplied Soft Computing10.1016/j.asoc.2024.111276(111276)Online publication date: Jan-2024
  • (2024)Uncertainty Measure-Based Incremental Feature Selection For Hierarchical ClassificationInternational Journal of Fuzzy Systems10.1007/s40815-024-01708-026:6(2074-2096)Online publication date: 18-May-2024
  • (2024)Incremental feature selection for large-scale hierarchical classification with the arrival of new samplesApplied Intelligence10.1007/s10489-024-05352-x54:5(3933-3953)Online publication date: 1-Mar-2024
  • (2024)A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive PropertiesComputer Vision – ECCV 202410.1007/978-3-031-72920-1_14(239-258)Online publication date: 1-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media