skip to main content
10.1145/3395027.3419589acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

Short Text Stream Clustering via Frequent Word Pairs and Reassignment of Outliers to Clusters

Published:29 September 2020Publication History

ABSTRACT

Short text stream clustering is an important but challenging task since massive amounts of text are generated from different social media. Given streams of texts, the proposed method clusters the streams of texts based on the frequently occurring word pairs (not necessarily consecutive) in texts. It detects outliers in the clusters and reassigns the outliers to appropriate clusters using the semantic similarity between the outliers and the clusters based on the dynamically computed similarity thresholds. Thus the proposed method efficiently deals with the concept drift problem. Experimental results demonstrate that the proposed approach outperforms the state-of-the-art short text stream clustering algorithms by a statistically significant margin on several short text datasets.

References

  1. Basant Agarwal and Namita Mittal. 2015. Prominent Feature Extraction for Sentiment Analysis (1st ed.). Springer Publishing Company, Incorporated.Google ScholarGoogle Scholar
  2. Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu. 2003. A Framework for Clustering Evolving Data Streams. In Proceedings of the 29th International Conference on Very Large Data Bases (Berlin, Germany). 81--92.Google ScholarGoogle Scholar
  3. Junyang Chen, Zhiguo Gong, and Weiwen Liu. 2020. A Dirichlet process biterm-based mixture model for short text stream clustering. Applied Intelligence 50, 5 (2020), 1609--1619.Google ScholarGoogle ScholarCross RefCross Ref
  4. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.Google ScholarGoogle Scholar
  5. Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information Theory. Wiley-Interscience, New York, NY, USA.Google ScholarGoogle Scholar
  6. Umesh Kokate, Arvind Deshpande, Parikshit Mahalle, and Pramod Patil. 2018. Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion. Big Data and Cognitive Computing 2, 4 (2018), 32.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jay Kumar, Junming Shao, Salah Uddin, and Wazir Ali. 2020. An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Online, 766--776.Google ScholarGoogle ScholarCross RefCross Ref
  8. Shangsong Liang, Emine Yilmaz, and Evangelos Kanoulas. 2016. Dynamic Clustering of Streaming Short Documents. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA). 995--1004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global Vectors for Word Representation.. In EMNLP, Vol. 14. 1532--1543.Google ScholarGoogle Scholar
  10. J. Xu, B. Xu, P. Wang, S. Zheng, G. Tian, J. Zhao, and B. Xu. 2017. Self-Taught convolutional neural networks for short text clustering. Neural Networks 88 (2017), 22--31.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jianhua Yin, Daren Chao, Zhongkun Liu, Wei Zhang, Xiaohui Yu, and Jianyong Wang. 2018. Model-based Clustering of Short Text Streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (London, United Kingdom). 2634--2642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yukun Zhao, Shangsong Liang, Zhaochun Ren, Jun Ma, Emine Yilmaz, and Maarten de Rijke. 2016. Explainable User Clustering in Short Text Streams. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (Pisa, Italy). ACM, 155--164.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Short Text Stream Clustering via Frequent Word Pairs and Reassignment of Outliers to Clusters

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DocEng '20: Proceedings of the ACM Symposium on Document Engineering 2020
        September 2020
        130 pages
        ISBN:9781450380003
        DOI:10.1145/3395027

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 September 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate178of537submissions,33%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader