Skip to main content

TreeBoost.MH: A Boosting Algorithm for Multi-label Hierarchical Text Categorization

  • Conference paper
String Processing and Information Retrieval (SPIRE 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4209))

Included in the following conference series:

Abstract

In this paper we propose TreeBoost.MH, an algorithm for multi-label Hierarchical Text Categorization (HTC) consisting of a hierarchical variant of AdaBoost.MH. TreeBoost.MH embodies several intuitions that had arisen before within HTC: e.g. the intuitions that both feature selection and the selection of negative training examples should be performed “locally”, i.e. by paying attention to the topology of the classification scheme. It also embodies the novel intuition that the weight distribution that boosting algorithms update at every boosting round should likewise be updated “locally”. We present the results of experimenting TreeBoost.MH on two HTC benchmarks, and discuss analytically its computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chakrabarti, S., Dom, B.E., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. Journal of Very Large Data Bases 7(3), 163–178 (1998)

    Article  Google Scholar 

  2. Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning (ICML1997), Nashville, US, pp. 170–178 (1997)

    Google Scholar 

  3. Gaussier, É., Goutte, C., Popat, K., Chen, F.: A Hierarchical Model for Clustering and Categorising Documents. In: Crestani, F., Girolami, M., van Rijsbergen, C.J.K. (eds.) ECIR 2002. LNCS, vol. 2291, pp. 229–247. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. McCallum, A.K., Rosenfeld, R., Mitchell, T.M., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Madison, US, pp. 359–367 (1998)

    Google Scholar 

  5. Toutanova, K., Chen, F., Popat, K., Hofmann, T.: Text classification in a hierarchical mixture model for small training sets. In: Proceedings of the 10th ACM International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, US, pp. 105–113 (2001)

    Google Scholar 

  6. Vinokourov, A., Girolami, M.: A probabilistic framework for the hierarchic organisation and classification of document collections. Journal of Intelligent Information Systems 18(2/3), 153–172 (2002)

    Article  Google Scholar 

  7. Ruiz, M., Srinivasan, P.: Hierarchical text classification using neural networks. Information Retrieval 5(1), 87–118 (2002)

    Article  MATH  Google Scholar 

  8. Weigend, A.S., Wiener, E.D., Pedersen, J.O.: Exploiting hierarchy in text categorization. Information Retrieval 1(3), 193–216 (1999)

    Article  Google Scholar 

  9. Wiener, E.D., Pedersen, J.O., Weigend, A.S.: A neural network approach to topic spotting. In: Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1995), Las Vegas, US, pp. 317–332 (1995)

    Google Scholar 

  10. Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR 2000), Athens, GR, pp. 256–263 (2000)

    Google Scholar 

  11. Yang, Y., Zhang, J., Kisiel, B.: A scalability analysis of classifiers in text categorization. In: Proceedings of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR 2003), Toronto, CA, pp. 96–103 (2003)

    Google Scholar 

  12. Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)

    Article  MATH  Google Scholar 

  13. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  14. Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, AU, pp. 215–223 (1998)

    Google Scholar 

  15. Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1997), Philadelphia, US, pp. 67–73 (1997)

    Google Scholar 

  16. Forman, G.: A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the 21st International Conference on Machine Learning (ICML 2004), Banff, CA (2004)

    Google Scholar 

  17. Esuli, A., Fagni, T., Sebastiani, F.: TreeBoost. MH: A boosting algorithm for multi-label hierarchical text categorization. Technical Report 2006-TR-56, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT (submitted for publication, 2006)

    Google Scholar 

  18. Lewis, D.D., Li, F., Rose, T., Yang, Y.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  19. Apté, C., Damerau, F.J., Weiss, S.M.: Automated learning of decision rules for text categorization. ACM Transactions on Information Systems 12(3), 233–251 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Esuli, A., Fagni, T., Sebastiani, F. (2006). TreeBoost.MH: A Boosting Algorithm for Multi-label Hierarchical Text Categorization. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_2

Download citation

  • DOI: https://doi.org/10.1007/11880561_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45774-9

  • Online ISBN: 978-3-540-45775-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics