Loading [MathJax]/extensions/TeX/color_ieee.js
An Online Dirichlet Model based on Sentence Embedding and DBSCAN for Noisy Short Text Stream Clustering | IEEE Conference Publication | IEEE Xplore

An Online Dirichlet Model based on Sentence Embedding and DBSCAN for Noisy Short Text Stream Clustering


Abstract:

Short text stream clustering has received widespread attention due to the rise of various social medias. However, short text streams present the following characteristics...Show More

Abstract:

Short text stream clustering has received widespread attention due to the rise of various social medias. However, short text streams present the following characteristics such as infinite length, text sparsity and ambiguity, topic evolution and containing noisy data. Existing short text clustering methods do not make full use of the semantic information of short texts to solve the sparsity and ambiguity of short texts and few methods take the noise into account in short text stream. Therefore, in this paper, we propose an Online Dirichlet model based on Sentence Embedding and DBSCAN for noisy short text stream clustering, called ODSE. Firstly, to handle the text sparsity and ambiguity, we use Sentence-Bert to represent each short text for achieving the globally semantic information of each short text. Secondly, to handle the noisy data contained in short texts, we introduce the buffer mechanism and refine the Dirichlet process multinomial mixture model using the DBSCAN algorithm except the above sentence embedding input. This model can handel the short texts one by one. Besides, to adapt to the infinite length and topic evolution, we introduce the forgetting mechanism to update the clusters. Finally, extensive experiments demonstrate that as compared to several state-of-art algorithms, our proposed approach can achieve better performances on four benchmark short text datasets.
Date of Conference: 18-23 July 2022
Date Added to IEEE Xplore: 30 September 2022
ISBN Information:

ISSN Information:

Conference Location: Padua, Italy

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.