research-article

Hierarchical Document Classification as a Sequence Generation Task

Authors:

Ralf KrestelAuthors Info & Claims

JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020

Pages 147 - 155

https://doi.org/10.1145/3383583.3398538

Published: 01 August 2020 Publication History

Abstract

Hierarchical classification schemes are an effective and natural way to organize large document collections. However, complex schemes make the manual classification time-consuming and require domain experts. Current machine learning approaches for hierarchical classification do not exploit all the information contained in the hierarchical schemes. During training, they do not make full use of the inherent parent-child relation of classes. For example, they neglect to tailor document representations, such as embeddings, to each individual hierarchy level. Our model overcomes these problems by addressing hierarchical classification as a sequence generation task. To this end, our neural network transforms a sequence of input words into a sequence of labels, which represents a path through a tree-structured hierarchy scheme. The evaluation uses a patent corpus, which exhibits a complex class hierarchy scheme and high-quality annotations from domain experts and comprises millions of documents. We re-implemented five models from related work and show that our basic model achieves competitive results in comparison with the best approach. A variation of our model that uses the recent Transformer architecture outperforms the other approaches. The error analysis reveals that the encoder of our model has the strongest influence on its classification performance.

Supplementary Material

MP4 File (3383583.3398538.mp4)

Presentation video

Download
1004.28 MB

References

[1]

Louay Abdelgawad, Peter Kluegl, Erdan Genc, Stefan Falkner, and Frank Hutter. 2019. Optimizing Neural Networks for Patent Classification. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD). 16.

[2]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR). 1--15.

[3]

Simon Baker, Douwe Kiela, and Anna Korhonen. 2016. Robust text classification for sparsely labelled data using multi-level embeddings. In Proceedings of the Conference on Computational Linguistics (COLING). 2333--2343.

[4]

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems (NeurIPS). 1171--1179.

[5]

Karim Benzineb and Jacques Guyot. 2011. Automated patent classification. In Current Challenges in Patent Information Retrieval. Springer, 239--261.

[6]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics (TACL), Vol. 5 (2017), 135--146.

[7]

Danielle Caled, Miguel Won, Bruno Martins, and Mário J. Silva. 2019. A Hierarchical Label Network for Multi-label EuroVoc Classification of Legislative Contents. In International Conference on Theory and Practice of Digital Libraries (TPDL), Antoine Doucet, Antoine Isaac, Koraljka Golub, Trond Aalberg, and Adam Jatowt (Eds.). Springer, 238--252.

[8]

Sheng Chen, Akshay Soni, Aasish Pappu, and Yashar Mehdad. 2017. Doctag2vec: An embedding based multi-label learning approach for document tagging. arXiv preprint arXiv:1707.04596 (2017).

[9]

Yangchi Chen, Melba M Crawford, and Joydeep Ghosh. 2004. Integrating support vector machines in a hierarchical output space decomposition framework. In Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS), Vol. 2. IEEE, 949--952.

[10]

Caspar J Fall, Atilla Törcsvári, Karim Benzineb, and Gabor Karetka. 2003. Automated categorization in the international patent classification. In ACM SIGIR Forum, Vol. 37. ACM, 10--25.

Digital Library

[11]

Juan Carlos Gomez and Marie-Francine Moens. 2014. A Survey of Automated Hierarchical Classification of Patents .Springer International Publishing, 215--249.

[12]

Mattyws F Grawe, Claudia A Martins, and Andreia G Bonfante. 2017. Automated Patent Classification Using Word Embedding. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on. IEEE, 408--411.

[13]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).

[14]

Aris Kosmopoulos, Ioannis Partalas, Eric Gaussier, Georgios Paliouras, and Ion Androutsopoulos. 2015. Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Mining and Knowledge Discovery, Vol. 29, 3 (2015), 820--865.

Digital Library

[15]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning (ICML). 1188--1196.

[16]

Maggie Yundi Li, Liling Tan, Stanley Kok, and Ewa Szymanska. 2018b. Unconstrained Product Categorization with Sequence-to-Sequence Models. In Proceedings of the Workshop on eCommerce (co-located with SIGIR). 1--6.

[17]

Shaobo Li, Jie Hu, Yuxin Cui, and Jianjun Hu. 2018a. DeepPatent: patent classification with convolutional neural networks and word embedding. Scientometrics, Vol. 117, 2 (01 Nov 2018), 721--744.

[18]

Yukun Ma, Erik Cambria, and Sa Gao. 2016. Label embedding for zero-shot fine-grained named entity typing. In Proceedings of the International Conference on Computational Linguistics (COLING). 171--180.

[19]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[20]

Jinseok Nam, Eneldo Loza Mencia, and Johannes Fürnkranz. 2016. All-in text: Learning document, label, and word representations jointly. In Proceedings of the Conference on Artificial Intelligence (AAAI) .

[21]

Hao Peng, Jianxin Li, Yu He, Yaopeng Liu, Mengjiao Bao, Lihong Wang, Yangqiu Song, and Qiang Yang. 2018. Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN. In Proceedings of the World Wide Web Conference (WWW). International World Wide Web Conferences Steering Committee, 1063--1072.

Digital Library

[22]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.

[23]

Florina Piroi, Mihai Lupu, Allan Hanbury, and Veronika Zenz. 2011. CLEF-IP 2011: Retrieval in the Intellectual Property Domain. In CLEF (notebook papers/labs/workshop) .

[24]

Julian Risch and Ralf Krestel. 2018. Learning Patent Speak: Investigating Domain-Specific Word Embeddings. In Proceedings of the International Conference on Digital Information Management (ICDIM). 63--68.

[25]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998--6008.

[26]

Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order Matters: Sequence to sequence for sets. In Proceedings of the International Conference on Learning Representations (ICLR). 1--11.

[27]

Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Joint Embedding of Words and Labels for Text Classification. arXiv preprint arXiv:1805.04174 (2018).

[28]

Jonatas Wehrmann, Ricardo Cerri, and Rodrigo Barros. 2018. Hierarchical multi-label classification networks. In Proceedings of the International Conference on Machine Learning (ICML). 5075--5084.

[29]

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016).

[30]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning (ICML). 2048--2057.

[31]

Gui-Rong Xue, Dikan Xing, Qiang Yang, and Yong Yu. 2008. Deep classification in large-scale text hierarchies. In Proceedings of the Conference on Research and Development in Information Retrieval (SIGIR). ACM, 619--626.

Digital Library

[32]

Yan Yan. 2016. Hierarchical Classification with Convolutional Neural Networks for Biomedical Literature. International Journal of Computer Science and Software Engineering, Vol. 5, 4 (2016), 58.

[33]

Pengcheng Yang, Xu Sun, Wei Li, Shuming Ma, Wei Wu, and Houfeng Wang. 2018. SGM: Sequence Generation Model for Multi-label Classification. In Proceedings of the International Conference on Computational Linguistics (COLING). 3915--3926.

[34]

Dani Yogatama, Daniel Gillick, and Nevena Lazic. 2015. Embedding methods for fine grained entity type classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing (ACL and IJCNLP), Vol. 2. 291--296.

Cited By

Zangari AMarcuzzo MRizzo MGiudice LAlbarelli AGasparetto A(2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
https://doi.org/10.3390/electronics13071199
Zhang JRen HGuo SSun J(2024)Multi-model Collaboration and Prompt-driven Patent Classification MethodsProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690464(332-336)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3690407.3690464
Feng ZMao KZhou H(2024)Adaptive micro- and macro-knowledge incorporation for hierarchical text classificationExpert Systems with Applications10.1016/j.eswa.2024.123374248(123374)Online publication date: Aug-2024
https://doi.org/10.1016/j.eswa.2024.123374
Show More Cited By

Index Terms

Hierarchical Document Classification as a Sequence Generation Task

Recommendations

Categorizing the Document Using Multi Class Classification in Data Mining
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks

Classification is the process of dividing the data into number of groups which are either dependent or independent of each other and each group acts as a class. The task of Classification can be done by using several methods using different types of ...
A Hierarchical Classification Model for Document Categorization
ICDAR '09: Proceedings of the 2009 10th International Conference on Document Analysis and Recognition

We propose a novel hierarchical classification method for documents categorization in this paper. The approach consists of multiple levels of classification for different hierarchies. Regularized Least Square (RLS)binary classifiers are applied in the ...
Training a hierarchical classifier using inter document relationships

Text classifiers automatically classify documents into appropriate concepts for different applications. Most classification approaches use flat classifiers that treat each concept as independent, even when the concept space is hierarchically structured. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020

August 2020

611 pages

ISBN:9781450375856

DOI:10.1145/3383583

General Chairs:
Ruhua Huang
Wuhan University, China
,
Dan Wu
Wuhan University, China
,
Gary Marchionini
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Daqing He
University of Pittsburgh, USA
,
Sally Jo Cunningham
University of Waikato, New Zealand
,
Preben Hansen
Stockholm University, Sweden

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
IEEE: Institute of Electrical and Electronics Engineers

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

JCDL '20

Sponsor:

JCDL '20: The ACM/IEEE Joint Conference on Digital Libraries in 2020

August 1 - 5, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
319
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)4

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zangari AMarcuzzo MRizzo MGiudice LAlbarelli AGasparetto A(2024)Hierarchical Text Classification and Its Foundations: A Review of Current ResearchElectronics10.3390/electronics1307119913:7(1199)Online publication date: 25-Mar-2024
https://doi.org/10.3390/electronics13071199
Zhang JRen HGuo SSun J(2024)Multi-model Collaboration and Prompt-driven Patent Classification MethodsProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690464(332-336)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3690407.3690464
Feng ZMao KZhou H(2024)Adaptive micro- and macro-knowledge incorporation for hierarchical text classificationExpert Systems with Applications10.1016/j.eswa.2024.123374248(123374)Online publication date: Aug-2024
https://doi.org/10.1016/j.eswa.2024.123374
Yu CShen YMao YAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Constrained Sequence-to-Tree Generation for Hierarchical Text ClassificationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531765(1865-1869)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531765
Xiao MQiao ZFu YDu YWang PZhou Y(2021)Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00087(757-766)Online publication date: Dec-2021
https://doi.org/10.1109/ICDM51629.2021.00087
Roudsari AAfshar JLee SLee W(2021)Comparison and Analysis of Embedding Methods for Patent Documents2021 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp51126.2021.00037(152-155)Online publication date: Jan-2021
https://doi.org/10.1109/BigComp51126.2021.00037
Pujari SFriedrich AStrötgen J(2021)A Multi-task Approach to Neural Multi-label Hierarchical Patent Classification Using TransformersAdvances in Information Retrieval10.1007/978-3-030-72113-8_34(513-528)Online publication date: 27-Mar-2021
https://doi.org/10.1007/978-3-030-72113-8_34
Bai JShim IPark S(2020)MEXN: Multi-Stage Extraction Network for Patent Document ClassificationApplied Sciences10.3390/app1018622910:18(6229)Online publication date: 8-Sep-2020
https://doi.org/10.3390/app10186229

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten