skip to main content
10.1145/2396761.2398615acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Clustering short text using Ncut-weighted non-negative matrix factorization

Published: 29 October 2012 Publication History

Abstract

Non-negative matrix factorization (NMF) has been successfully applied in document clustering. However, experiments on short texts, such as microblogs, Q&A documents and news titles, suggest unsatisfactory performance of NMF. An major reason is that the traditional term weighting schemes, like binary weight and tfidf, cannot well capture the terms' discriminative power and importance in short texts, due to the sparsity of data. To tackle this problem, we proposed a novel term weighting scheme for NMF, derived from the Normalized Cut (Ncut) problem on the term affinity graph. Different from idf, which emphasizes discriminability on document level, the Ncut weighting measures terms' discriminability on term level. Experiments on two data sets show our weighting scheme significantly boosts NMF's performance on short text clustering.

References

[1]
R. Albright, J. Cox, D. Duling, A. Langville, and C. Meyer. Algorithms, initializations, and convergence for the nonnegative matrix factorization. Technical report, NCSU Technical Report Math 81706. http://meyer. math. ncsu. edu/Meyer/Abstracts/Publications. html, 2006.
[2]
C. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval, volume 1. Cambridge University Press Cambridge, 2008.
[3]
J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):888--905, 2000.
[4]
Q. Wang, J. Xu, H. Li, and N. Craswell. Regularized latent semantic indexing. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information, pages 685--694. ACM, 2011.
[5]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267--273. ACM, 2003.
[6]
S. Yu and J. Shi. Multiclass spectral clustering. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 313--319. IEEE, 2003.

Cited By

View all
  • (2024)Double-target self-supervised clustering with multi-feature fusion for medical question textsPeerJ Computer Science10.7717/peerj-cs.207510(e2075)Online publication date: 28-Jun-2024
  • (2024)Public perception of cultural ecosystem services in historic districts based on biterm topic modelScientific Reports10.1038/s41598-024-62770-014:1Online publication date: 22-May-2024
  • (2024)Revolutionary text clustering: Investigating transfer learning capacity of SBERT models through pooling techniquesEngineering Science and Technology, an International Journal10.1016/j.jestch.2024.10173055(101730)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NMF
  2. clustering
  3. normalized cut
  4. short text

Qualifiers

  • Poster

Conference

CIKM'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Double-target self-supervised clustering with multi-feature fusion for medical question textsPeerJ Computer Science10.7717/peerj-cs.207510(e2075)Online publication date: 28-Jun-2024
  • (2024)Public perception of cultural ecosystem services in historic districts based on biterm topic modelScientific Reports10.1038/s41598-024-62770-014:1Online publication date: 22-May-2024
  • (2024)Revolutionary text clustering: Investigating transfer learning capacity of SBERT models through pooling techniquesEngineering Science and Technology, an International Journal10.1016/j.jestch.2024.10173055(101730)Online publication date: Jul-2024
  • (2023)Comparison of Clustering Methods on Iris Dataset2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC)10.1109/ICFTIC59930.2023.10456161(86-92)Online publication date: 17-Nov-2023
  • (2022)An effective short-text topic modelling with neighbourhood assistance-driven NMF in TwitterSocial Network Analysis and Mining10.1007/s13278-022-00898-512:1Online publication date: 24-Jul-2022
  • (2022)Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysisArtificial Intelligence Review10.1007/s10462-022-10254-w56:6(5133-5260)Online publication date: 26-Oct-2022
  • (2021)Clustering Introductory Computer Science Exercises Using Topic Modeling MethodsIEEE Transactions on Learning Technologies10.1109/TLT.2021.305690714:1(42-54)Online publication date: Feb-2021
  • (2021)A Scalable Short-Text Clustering Algorithm Using Apache Spark2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI52525.2021.00149(927-934)Online publication date: Nov-2021
  • (2021)News recommender system: a review of recent progress, challenges, and opportunitiesArtificial Intelligence Review10.1007/s10462-021-10043-x55:1(749-800)Online publication date: 21-Jul-2021
  • (2020)Confronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projections2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI50040.2020.00129(813-820)Online publication date: Nov-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media