research-article

Class-Specific Word Sense Aware Topic Modeling via Soft Orthogonalized Topics

Authors:

Wenbo Li,

Yao Yang,

Einoshin SuzukiAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 1218 - 1227

https://doi.org/10.1145/3583780.3614809

Published: 21 October 2023 Publication History

Get Access

Abstract

We propose a word sense aware topic model for document classification based on soft orthogonalized topics. An essential problem for this task is to capture word senses related to classes, i.e., class-specific word senses. Traditional models mainly introduce semantic information of knowledge libraries for word sense discovery. However, this information may not align with the classification targets, because these targets are often subjective and task-related. We aim to model the class-specific word senses in topic space. The challenge is to optimize the class separability of the senses, i.e., obtaining sense vectors with (a) high intra-class and (b) low inter-class similarities. Most existing models predefine specific topics for each class to specify the class-specific sense vectors. We call them hard orthogonalization based methods. These methods can hardly achieve both (a) and (b) since they assume the conditional independence of topics to classes and inevitably lose topic information. To this problem, we propose soft orthogonalization for topics. Specifically, we reserve all the topics and introduce a group of class-specific weights for each word to handle the importance of topic dimensions to class separability. Besides, we detect and use highly class-specific words in each document to guide sense estimation. Our experiments on two standard datasets show that our proposal outperforms other state-of-the-art models in terms of accuracy of sense estimation, document classification, and topic modeling. In addition, our joint learning experiments with the pre-trained language model BERT showcased the best complementarity of our model in most cases compared to other topic models.

Supplementary Material

MP4 File (full0124-video.mp4)

In our research, we introduce a word sense aware topic model tailored for document classification, emphasizing class-specific word senses. Traditional methods often rely on general knowledge libraries for word sense discovery, which may not always align with specific classification goals. Our solution? Soft orthogonalization for topics. Unlike existing models that predefine topics for each class, our approach retains all topics and assigns class-specific weights to each word, optimizing for both intra-class and inter-class similarities. We also leverage highly class-specific words in documents for better sense estimation. Our tests on two benchmark datasets reveal superior performance in sense estimation, document classification, and topic modeling. When combined with the renowned BERT model, our method consistently showcases enhanced complementarity. Watch this video to learn more about our proposed approach.

Download
106.88 MB

References

[1]

Munir Ahmad, Shabib Aftab, Syed Shah Muhammad, and Sarfraz Ahmad. 2017. Machine Learning Techniques for Sentiment Analysis: A Review. Int. J. Multidiscip. Sci. Eng 8, 3 (2017), 27.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation

Statistical word sense aware topic models

Word co-occurrence augmented topic model in short text

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations