skip to main content
10.1145/3383583.3398538acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections

Hierarchical Document Classification as a Sequence Generation Task

Published: 01 August 2020 Publication History


Hierarchical classification schemes are an effective and natural way to organize large document collections. However, complex schemes make the manual classification time-consuming and require domain experts. Current machine learning approaches for hierarchical classification do not exploit all the information contained in the hierarchical schemes. During training, they do not make full use of the inherent parent-child relation of classes. For example, they neglect to tailor document representations, such as embeddings, to each individual hierarchy level. Our model overcomes these problems by addressing hierarchical classification as a sequence generation task. To this end, our neural network transforms a sequence of input words into a sequence of labels, which represents a path through a tree-structured hierarchy scheme. The evaluation uses a patent corpus, which exhibits a complex class hierarchy scheme and high-quality annotations from domain experts and comprises millions of documents. We re-implemented five models from related work and show that our basic model achieves competitive results in comparison with the best approach. A variation of our model that uses the recent Transformer architecture outperforms the other approaches. The error analysis reveals that the encoder of our model has the strongest influence on its classification performance.

Supplementary Material

MP4 File (3383583.3398538.mp4)
Presentation video


Louay Abdelgawad, Peter Kluegl, Erdan Genc, Stefan Falkner, and Frank Hutter. 2019. Optimizing Neural Networks for Patent Classification. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD). 16.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR). 1--15.
Simon Baker, Douwe Kiela, and Anna Korhonen. 2016. Robust text classification for sparsely labelled data using multi-level embeddings. In Proceedings of the Conference on Computational Linguistics (COLING). 2333--2343.
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems (NeurIPS). 1171--1179.
Karim Benzineb and Jacques Guyot. 2011. Automated patent classification. In Current Challenges in Patent Information Retrieval. Springer, 239--261.
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics (TACL), Vol. 5 (2017), 135--146.
Danielle Caled, Miguel Won, Bruno Martins, and Mário J. Silva. 2019. A Hierarchical Label Network for Multi-label EuroVoc Classification of Legislative Contents. In International Conference on Theory and Practice of Digital Libraries (TPDL), Antoine Doucet, Antoine Isaac, Koraljka Golub, Trond Aalberg, and Adam Jatowt (Eds.). Springer, 238--252.
Sheng Chen, Akshay Soni, Aasish Pappu, and Yashar Mehdad. 2017. Doctag2vec: An embedding based multi-label learning approach for document tagging. arXiv preprint arXiv:1707.04596 (2017).
Yangchi Chen, Melba M Crawford, and Joydeep Ghosh. 2004. Integrating support vector machines in a hierarchical output space decomposition framework. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Vol. 2. IEEE, 949--952.
Caspar J Fall, Atilla Törcsvári, Karim Benzineb, and Gabor Karetka. 2003. Automated categorization in the international patent classification. In ACM SIGIR Forum, Vol. 37. ACM, 10--25.
Juan Carlos Gomez and Marie-Francine Moens. 2014. A Survey of Automated Hierarchical Classification of Patents .Springer International Publishing, 215--249.
Mattyws F Grawe, Claudia A Martins, and Andreia G Bonfante. 2017. Automated Patent Classification Using Word Embedding. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on. IEEE, 408--411.
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).
Aris Kosmopoulos, Ioannis Partalas, Eric Gaussier, Georgios Paliouras, and Ion Androutsopoulos. 2015. Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Mining and Knowledge Discovery, Vol. 29, 3 (2015), 820--865.
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning (ICML). 1188--1196.
Maggie Yundi Li, Liling Tan, Stanley Kok, and Ewa Szymanska. 2018b. Unconstrained Product Categorization with Sequence-to-Sequence Models. In Proceedings of the Workshop on eCommerce (co-located with SIGIR). 1--6.
Shaobo Li, Jie Hu, Yuxin Cui, and Jianjun Hu. 2018a. DeepPatent: patent classification with convolutional neural networks and word embedding. Scientometrics, Vol. 117, 2 (01 Nov 2018), 721--744.
Yukun Ma, Erik Cambria, and Sa Gao. 2016. Label embedding for zero-shot fine-grained named entity typing. In Proceedings of the International Conference on Computational Linguistics (COLING). 171--180.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
Jinseok Nam, Eneldo Loza Mencia, and Johannes Fürnkranz. 2016. All-in text: Learning document, label, and word representations jointly. In Proceedings of the Conference on Artificial Intelligence (AAAI) .
Hao Peng, Jianxin Li, Yu He, Yaopeng Liu, Mengjiao Bao, Lihong Wang, Yangqiu Song, and Qiang Yang. 2018. Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN. In Proceedings of the World Wide Web Conference (WWW). International World Wide Web Conferences Steering Committee, 1063--1072.
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.
Florina Piroi, Mihai Lupu, Allan Hanbury, and Veronika Zenz. 2011. CLEF-IP 2011: Retrieval in the Intellectual Property Domain. In CLEF (notebook papers/labs/workshop) .
Julian Risch and Ralf Krestel. 2018. Learning Patent Speak: Investigating Domain-Specific Word Embeddings. In Proceedings of the International Conference on Digital Information Management (ICDIM). 63--68.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998--6008.
Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order Matters: Sequence to sequence for sets. In Proceedings of the International Conference on Learning Representations (ICLR). 1--11.
Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint Embedding of Words and Labels for Text Classification. arXiv preprint arXiv:1805.04174 (2018).
Jonatas Wehrmann, Ricardo Cerri, and Rodrigo Barros. 2018. Hierarchical multi-label classification networks. In Proceedings of the International Conference on Machine Learning (ICML). 5075--5084.
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning (ICML). 2048--2057.
Gui-Rong Xue, Dikan Xing, Qiang Yang, and Yong Yu. 2008. Deep classification in large-scale text hierarchies. In Proceedings of the Conference on Research and Development in Information Retrieval (SIGIR). ACM, 619--626.
Yan Yan. 2016. Hierarchical Classification with Convolutional Neural Networks for Biomedical Literature. International Journal of Computer Science and Software Engineering, Vol. 5, 4 (2016), 58.
Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence Generation Model for Multi-label Classification. In Proceedings of the International Conference on Computational Linguistics (COLING). 3915--3926.
Dani Yogatama, Daniel Gillick, and Nevena Lazic. 2015. Embedding methods for fine grained entity type classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing (ACL and IJCNLP), Vol. 2. 291--296.

Cited By

View all
  • (2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
  • (2024)Multi-model Collaboration and Prompt-driven Patent Classification MethodsProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690464(332-336)Online publication date: 21-Jun-2024
  • (2024)Adaptive micro- and macro-knowledge incorporation for hierarchical text classificationExpert Systems with Applications10.1016/j.eswa.2024.123374248(123374)Online publication date: Aug-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
August 2020
611 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2020


Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. document classification
  3. hierarchical classification
  4. neural networks
  5. patent documents


  • Research-article


JCDL '20

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)4
Reflects downloads up to 27 Feb 2025

Other Metrics


Cited By

View all
  • (2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
  • (2024)Multi-model Collaboration and Prompt-driven Patent Classification MethodsProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690464(332-336)Online publication date: 21-Jun-2024
  • (2024)Adaptive micro- and macro-knowledge incorporation for hierarchical text classificationExpert Systems with Applications10.1016/j.eswa.2024.123374248(123374)Online publication date: Aug-2024
  • (2022)Constrained Sequence-to-Tree Generation for Hierarchical Text ClassificationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531765(1865-1869)Online publication date: 6-Jul-2022
  • (2021)Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00087(757-766)Online publication date: Dec-2021
  • (2021)Comparison and Analysis of Embedding Methods for Patent Documents2021 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp51126.2021.00037(152-155)Online publication date: Jan-2021
  • (2021)A Multi-task Approach to Neural Multi-label Hierarchical Patent Classification Using TransformersAdvances in Information Retrieval10.1007/978-3-030-72113-8_34(513-528)Online publication date: 27-Mar-2021
  • (2020)MEXN: Multi-Stage Extraction Network for Patent Document ClassificationApplied Sciences10.3390/app1018622910:18(6229)Online publication date: 8-Sep-2020

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media