Abstract:
Mining organization-related topics is helpful to analyze the information dissemination situation. Existing methods based on graph neural networks mainly consider the asso...Show MoreMetadata
Abstract:
Mining organization-related topics is helpful to analyze the information dissemination situation. Existing methods based on graph neural networks mainly consider the association between words and documents, they ignore the semantic interactions between documents, and do not consider the heterogeneity of edges which are difficult to solve the challenge of blurred topic boundaries in real scenarios, resulting in performance loss. This paper proposes a BERT-based Heterogeneous Graph Convolution Network (BERT-HGCN) approach for semi-supervised topic mining that comprehensively considers multi-semantic relations between words and documents. It deeply combines the advantages of transductive learning with pre-training models. We model documents as graph-structured data and capture multiple semantic dependencies among word-word, word-doc, and doc-doc via information propagation mechanism. During the model learning process, a two-stream encoding mechanism is used to learn the structural and semantic representations, which combines a hierarchical graph convolution network (HGCN) and a BERT-based auto-encoder. It considers both edges heterogeneity and semantics of original documents. Finally, a dual-supervision loss is used to train the classifier based on graph nodes and semantic representations for topic mining. We empirically evaluate the performance of the proposed model on a real-world organization-related dataset, and the experimental results demonstrate the efficacy of the model.
Date of Conference: 18-23 July 2022
Date Added to IEEE Xplore: 30 September 2022
ISBN Information: