Skip to main content

A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Abstract

Many important classification problems in real world consist of a large number of categories. Hierarchical multi-label text classification (HMTC) with higher accuracy over large sets of closely related categories organized in a hierarchical structure or taxonomy has become a challenging problem. In this paper, we present a hierarchical fine-tuning deep learning approach for HMTC, where a joint embedding of words and their parent categories is generated by leveraging the hierarchical relations in the hierarchical structure of categories and the textual data. A fine tuning technique is applied to the Ordered Neural LSTM (ONLSTM) neural network such that the text classification results in the upper levels are able to help the classification in the lower ones. The extensive experiments were made over two benchmark datasets, and the results show that the method proposed in this paper outperforms the state-of-the-art hierarchical and flat multi-label text classification approaches, in particular the aspect of reducing computational costs while achieving superior performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.dmoz-odp.org/.

  2. 2.

    https://meshb.nlm.nih.gov/treeView.

  3. 3.

    https://www.loc.gov/aba/cataloging/classification/.

  4. 4.

    https://en.wikipedia.org/wiki/Portal:Contents/Categories.

  5. 5.

    github.com/masterzjp/HFT-ONLSTM.

References

  1. Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of International Conference on Machine Learning (ICML 1997), pp. 170–178, Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  2. Stein, R., Jaques, P., Valiati, J.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)

    Article  Google Scholar 

  3. Silla, C., Freitas, A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)

    Article  MathSciNet  Google Scholar 

  4. Kowsari, K., Brown, D., Heidarysafa, M., et al.: HDLTex: hierarchical deep learning for text classification. In: Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, pp. 364–371 (2017)

    Google Scholar 

  5. Sinha, K., Dong, Y., et al.: A hierarchical neural attention-based text classifier. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 817–823 (2018)

    Google Scholar 

  6. Sun, A., Lim, E.: Hierarchical text classification and evaluation. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 521–528 (2001)

    Google Scholar 

  7. Tsatsaronis, G., Balikas, G., et al.: An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16(1), 138 (2015)

    Article  Google Scholar 

  8. Shen, Y., Tan, S., Sordoni, A., et al.: Ordered neurons: Integrating tree structures into recurrent neural networks. In: Proceedings of the 7th International Conference on Learning Representations (2019)

    Google Scholar 

  9. Alexis, C., Holger, S., et al.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1107–1116 (2017)

    Google Scholar 

  10. Lai, S., Xu, L., et al.: Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp. 2267–2273 (2015)

    Google Scholar 

  11. Zhou, P., Qi, Z., Zheng, S., et al.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 3485–3495 (2016)

    Google Scholar 

  12. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems, pp. 649–657 (2015)

    Google Scholar 

  13. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  14. Han, X., Liu, J., Shen, Z., Miao, C.: An optimized k-nearest neighbor algorithm for large scale hierarchical text classification. In: Proceedings of the Joint ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification, pp. 2–12 (2011)

    Google Scholar 

  15. Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 78–87 (2004)

    Google Scholar 

  16. Chen, Y., Crawford, M., Ghosh, J.: Integrating support vector machines in a hierarchical output space decomposition framework. In: IEEE International Geoscience & Remote Sensing Symposium IEEE, vol. 2, pp. 949– 952 (2004)

    Google Scholar 

  17. Gopal, S., Yang, Y.: Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 257–265 (2013)

    Google Scholar 

  18. McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning, vol. 98, pp. 359–367 (1998)

    Google Scholar 

  19. Bennett, P., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18 (2009)

    Google Scholar 

  20. Bi, W., Kwok, J.: Mandatory leaf node prediction in hierarchical multilabel classification. IEEE Trans. Neural Networks Learn. Syst. 25(12), 2275–2287 (2014)

    Article  Google Scholar 

  21. Jin-Bo, T.: An improved hierarchical document classification method. New Technol. Libr. Inf. Serv. 2(2), 56–59 (2007)

    Google Scholar 

  22. Li, W., Miao, D., Wei, Z., Wang, W.: Hierarchical text classification model based on blocking priori knowledge. Pattern Recog. Artif. Intell. 23(4), 456–463 (2010)

    Google Scholar 

  23. Weigend, A., Wiener, E., Pedersen, J.: Exploiting hierarchy in text categorization. Inf. Retrieval 1(3), 193–216 (1999)

    Article  Google Scholar 

  24. Liu, T., Yang, Y., et al.: Support vector machines classification with a very large scale taxonomy. ACM SIGKDD Explor. Newsl. 7(1), 36–43 (2005)

    Article  Google Scholar 

  25. Shimura, K., Li, J., Fukumoto, F.: HFT-CNN: learning hierarchical category structure for multi-label short text categorization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 811–816 (2018)

    Google Scholar 

  26. Dzmitry, B., Kyunghyun, C., Yoshua, B.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (2015)

    Google Scholar 

  27. Yogatama, D., Faruqui, M., Dyer, C., Smith, N.A.: Learning word representations with hierarchical sparse coding. In: Proceedings of International Conference on Machine Learning (ICML 2015), pp. 87–96 (2015)

    Google Scholar 

  28. Bengio, Y., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  29. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)

    Google Scholar 

  30. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)

    Google Scholar 

  31. Banerjee, S., Akkaya, C., et al.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300(2019)

    Google Scholar 

  32. He, K., Girshick, R., Dollar, P.: Rethinking imagenet pre-training. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4917–4926 (2019)

    Google Scholar 

  33. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (2015)

    Google Scholar 

  34. Joulin, A., et al.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 427–431 (2017)

    Google Scholar 

  35. Lee, J., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. In: Proceedings of the 15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 515–520 (2016)

    Google Scholar 

  36. Lin, Z., Feng, M., et al.: A structured self-attentive sentence embedding. In: Proceedings of the 5th International Conference on Learning Representations (2017)

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by the National Key R&D Program of China under granted (2018YFC0830605, 2018YFC0831404).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinglong Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Zhao, J., Jin, B. (2020). A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61616-8_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61615-1

  • Online ISBN: 978-3-030-61616-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics