Abstract
Many important classification problems in real world consist of a large number of categories. Hierarchical multi-label text classification (HMTC) with higher accuracy over large sets of closely related categories organized in a hierarchical structure or taxonomy has become a challenging problem. In this paper, we present a hierarchical fine-tuning deep learning approach for HMTC, where a joint embedding of words and their parent categories is generated by leveraging the hierarchical relations in the hierarchical structure of categories and the textual data. A fine tuning technique is applied to the Ordered Neural LSTM (ONLSTM) neural network such that the text classification results in the upper levels are able to help the classification in the lower ones. The extensive experiments were made over two benchmark datasets, and the results show that the method proposed in this paper outperforms the state-of-the-art hierarchical and flat multi-label text classification approaches, in particular the aspect of reducing computational costs while achieving superior performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of International Conference on Machine Learning (ICML 1997), pp. 170–178, Morgan Kaufmann Publishers Inc., San Francisco (1997)
Stein, R., Jaques, P., Valiati, J.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)
Silla, C., Freitas, A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)
Kowsari, K., Brown, D., Heidarysafa, M., et al.: HDLTex: hierarchical deep learning for text classification. In: Proceedings of the 16th IEEE International Conference on Machine Learning and Applications, pp. 364–371 (2017)
Sinha, K., Dong, Y., et al.: A hierarchical neural attention-based text classifier. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 817–823 (2018)
Sun, A., Lim, E.: Hierarchical text classification and evaluation. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 521–528 (2001)
Tsatsaronis, G., Balikas, G., et al.: An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16(1), 138 (2015)
Shen, Y., Tan, S., Sordoni, A., et al.: Ordered neurons: Integrating tree structures into recurrent neural networks. In: Proceedings of the 7th International Conference on Learning Representations (2019)
Alexis, C., Holger, S., et al.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1107–1116 (2017)
Lai, S., Xu, L., et al.: Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp. 2267–2273 (2015)
Zhou, P., Qi, Z., Zheng, S., et al.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 3485–3495 (2016)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the 29th Annual Conference on Neural Information Processing Systems, pp. 649–657 (2015)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Han, X., Liu, J., Shen, Z., Miao, C.: An optimized k-nearest neighbor algorithm for large scale hierarchical text classification. In: Proceedings of the Joint ECML/PKDD PASCAL Workshop on Large-Scale Hierarchical Classification, pp. 2–12 (2011)
Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 78–87 (2004)
Chen, Y., Crawford, M., Ghosh, J.: Integrating support vector machines in a hierarchical output space decomposition framework. In: IEEE International Geoscience & Remote Sensing Symposium IEEE, vol. 2, pp. 949– 952 (2004)
Gopal, S., Yang, Y.: Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 257–265 (2013)
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning, vol. 98, pp. 359–367 (1998)
Bennett, P., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18 (2009)
Bi, W., Kwok, J.: Mandatory leaf node prediction in hierarchical multilabel classification. IEEE Trans. Neural Networks Learn. Syst. 25(12), 2275–2287 (2014)
Jin-Bo, T.: An improved hierarchical document classification method. New Technol. Libr. Inf. Serv. 2(2), 56–59 (2007)
Li, W., Miao, D., Wei, Z., Wang, W.: Hierarchical text classification model based on blocking priori knowledge. Pattern Recog. Artif. Intell. 23(4), 456–463 (2010)
Weigend, A., Wiener, E., Pedersen, J.: Exploiting hierarchy in text categorization. Inf. Retrieval 1(3), 193–216 (1999)
Liu, T., Yang, Y., et al.: Support vector machines classification with a very large scale taxonomy. ACM SIGKDD Explor. Newsl. 7(1), 36–43 (2005)
Shimura, K., Li, J., Fukumoto, F.: HFT-CNN: learning hierarchical category structure for multi-label short text categorization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 811–816 (2018)
Dzmitry, B., Kyunghyun, C., Yoshua, B.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (2015)
Yogatama, D., Faruqui, M., Dyer, C., Smith, N.A.: Learning word representations with hierarchical sparse coding. In: Proceedings of International Conference on Machine Learning (ICML 2015), pp. 87–96 (2015)
Bengio, Y., et al.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394 (2010)
Banerjee, S., Akkaya, C., et al.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300(2019)
He, K., Girshick, R., Dollar, P.: Rethinking imagenet pre-training. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4917–4926 (2019)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (2015)
Joulin, A., et al.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 427–431 (2017)
Lee, J., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. In: Proceedings of the 15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 515–520 (2016)
Lin, Z., Feng, M., et al.: A structured self-attentive sentence embedding. In: Proceedings of the 5th International Conference on Learning Representations (2017)
Acknowledgments
This work is partially supported by the National Key R&D Program of China under granted (2018YFC0830605, 2018YFC0831404).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, Y., Zhao, J., Jin, B. (2020). A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_60
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)