S2-HTC: Hierarchical Text Classification via Fusing the Structural and Semantic Information

Shen, Yinghan; Yan, Yu; Yin, Dechun; Shen, Huawei

doi:10.1007/978-981-97-5569-1_4

Yinghan Shen^15,17,
Yu Yan^16,17,
Dechun Yin¹⁶ &
…
Huawei Shen^15,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14854))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

311 Accesses

Abstract

Hierarchical Text Classification (HTC) provides a robust mechanism for systematic text categorization, addressing diverse requirements of text understanding and retrieval. A key issue in HTC is the feature learning for long-tail distributed labels through the use of label relations. Most current HTC approaches mainly focus on label structures, overlooking the intricate semantic details of long-tail labels. This often leads to inadequate modeling of these less frequent, but semantically rich labels, compromising the overall classification accuracy. In this study, we propose the method named S²-HTC: Hierarchical Text Classification via fusing the Structural and Semantic Information(S²-HTC), which achieves hierarchical classification by leveraging a method incorporating label structural and semantics relations with the balancing loss calculation. Specifically, S²-HTC introduces the Label Semantic-Aware and Hierarchical Adjacency Matrix (LSA-HAM) to simultaneously capture and integrate the hierarchical and semantic associations of labels. Realizing the natural challenge of long-tailed label distributions in HTC, S²-HTC adopts the balanced loss computation to efficiently represent low-level label features. Experimental results demonstrate that S²-HTC outperforms state-of-the-art approaches.

Yinghan Shen and Yu Yan are co-first authors. This research was supported by the Natural Science Foundation Program (Grant No. U21B2046 and 62172393).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 159.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Modeling Text-Label Alignment for Hierarchical Text Classification

HLC: hierarchically-aware label correlation for hierarchical text classification

Article 12 January 2024

Label-Guided Graphormer for Hierarchy Text Classification

References

Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label text classification with capsule networks. In: ACL SRW, pp. 323–330 (2019)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: Pretrained LM for Scientific Text. In: EMNLP (2019), arXiv:1903.10676
Chen, H., Ma, Q., Lin, Z., Yan, J.: Hierarchy-aware label semantics matching network for HTC. In: ACL/IJCNLP, pp. 4370–4379 (2021)
Google Scholar
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: CVPR, pp. 9268–9277 (2019)
Google Scholar
Deng, Z., Peng, H., He, D., Li, J., Yu, P.S.: HTCInfoMax: Global Model for HTC via Information Maximization. arXiv preprint (2021), arXiv:2104.05220
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers. arXiv preprint (2018), arXiv:1810.04805
Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. In: EMNLP 2021, pp. 6894–6910 (2021)
Google Scholar
Huang, Y., Giledereli, B., Köksal, A., Özgür, A., Ozkirimli, E.: Balancing methods for multi-label text classification with long-tailed class distribution. In: EMNLP, pp. 8153–8161 (2021)
Google Scholar
Kowsari, K., Brown, D.E., Heidarysafa, M., Jafari Meimandi, K., Gerber, M.S., Barnes, L.E.: HDLTex: hierarchical deep learning for text classification. In: ICMLA, pp. 364–371 (2017)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T.R., Li, F.: RCV1: New Benchmark for Text Categorization Research. JMLR 5, 361–397 (2004)
Google Scholar
Liu, M., et al.: Overview of NLPCC2022: multi-label classification for scientific lit. In: NLPCC, pp. 320–327 (2022)
Google Scholar
Liu, Y., et al.: RoBERTa: Robustly Optimized BERT Pretraining. arXiv preprint (2019), arXiv:1907.11692
Liu, Y., et al.: Enhancing HTC via knowledge graph integration. In: ACL Findings, pp. 5797–5810 (2023)
Google Scholar
Lu, J., et al.: Multi-task hierarchical cross-attention network for multi-label text class. In: NLPCC, pp. 156–167 (2022)
Google Scholar
Mueller, A., et al.: Label semantic aware pre-training for few-shot text classification. In: ACL, pp. 8318–8334 (2022)
Google Scholar
Song, J., Wang, F., Yang, Y.: Peer-label assisted HTC. In: ACL, pp. 3747–3758 (2023)
Google Scholar
Wang, B., et al.: BIT-WOW at NLPCC-2022: hierarchical multi-label classification via LAGCN. In: NLPCC, pp. 192–203 (2022)
Google Scholar
Wang, Z., Wang, P., Huang, L., Sun, X., Wang, H.: Incorporating hierarchy into text encoder: a contrastive learning approach for hierarchical text classification. In: ACL, pp. 7109–7119 (2022)
Google Scholar
Wang, Z., et al.: HPT: hierarchy-aware prompt tuning for hierarchical text classification. In: EMNLP, pp. 3740–3751 (2022)
Google Scholar
Xiao, M., Qiao, Z., Fu, Y., Du, Y., Wang, P., Zhou, Y.: Expert knowledge-guided length-variant hierarchical label generation for proposal classification. In: ICDM, pp. 757–766. IEEE (2021)
Google Scholar
Yu, C., Shen, Y., Mao, Y.: Constrained sequence-to-tree generation for HTC. In: SIGIR, pp. 1865–1869 (2022)
Google Scholar
Zangari, A., Marcuzzo, M., Schiavinato, M., Rizzo, M., Gasparetto, A., Albarelli, A.E.A.: Hierarchical text classification: a review of current research. Expert Syst. Appl. 224 (2023)
Google Scholar
Zhang, K., et al.: Description-enhanced label embedding contrastive learning for text class. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Google Scholar
Zhao, X., et al.: Interactive fusion model for hierarchical multi-label text class. In: NLPCC, pp. 168–178 (2022)
Google Scholar
Zhou, J., et al.: Hierarchy-aware global model for HTC. In: ACL, pp. 1106–1117 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yinghan Shen & Huawei Shen
School of Information and Cyber Security, People’s Public Security University of China, Beijing, China
Yu Yan & Dechun Yin
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yinghan Shen, Yu Yan & Huawei Shen

Authors

Yinghan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Yu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Dechun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Huawei Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinghan Shen .

Editor information

Editors and Affiliations

Osaka University, Suita, Osaka, Japan
Makoto Onizuka
KAIST, Daejeon, Korea (Republic of)
Jae-Gil Lee
Beihang University, Beijing, China
Yongxin Tong
Osaka University, Osaka, Japan
Chuan Xiao
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
University of Grenoble Alpes, Saint-Martin d’Hères, France
Sihem Amer-Yahia
University of Michigan, Ann Arbor, MI, USA
H. V. Jagadish
Nagoya University, Nagoya, Japan
Kejing Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, Y., Yan, Y., Yin, D., Shen, H. (2024). S²-HTC: Hierarchical Text Classification via Fusing the Structural and Semantic Information. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14854. Springer, Singapore. https://doi.org/10.1007/978-981-97-5569-1_4

Download citation

DOI: https://doi.org/10.1007/978-981-97-5569-1_4
Published: 13 December 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5568-4
Online ISBN: 978-981-97-5569-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics