skip to main content
10.1145/3185089.3185113acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicscaConference Proceedingsconference-collections
research-article

Overlapping Clustering for Textual Data

Published: 08 February 2018 Publication History

Abstract

Texts have inherent overlapping, therefore for clustering textual data, the overlapping clustering algorithms are more appropriate. In this regard, a major challenge is that they are very slow in clustering big volumes of textual data. Among others, OKM and OSOM are two important overlapping clustering algorithms. In this study, we have implemented and compared the performance of these two algorithms. The experimental results of our study show that OKM clusters have better overlap sizes when these algorithms are used for clustering textual data. Since both of them require much time to complete, none of these two algorithms is suitable for clustering textual data. Therefore we mastermind a fast overlapping version of SOM which is suitable for this purpose.

References

[1]
N'Cir, C.E.B., Cleuziou, G., and Essoussi, N. 2015. Overview of overlapping partitional clustering methods. In Partitional Clustering Algorithms, Springer International Publishing, 245-275.
[2]
Cleuziou, G. 2008. An extended version of the k-means method for overlapping clustering. In Proceedings of the 19th Int. Conf. on Pattern Recognition (ICPR). Florida, 1--4.
[3]
Cleuziou, G. 2013. Osom: a method for building overlapping topological maps. Pattern Recognition Letters, 34, 3, (2013), 239--246.
[4]
Diday, E. 1987. Orders and overlapping clusters by pyramids, Ph.D. thesis, INRIA, Paris, France.
[5]
Bertrand, P., and Janowitz, M. F. 2003. The k-weak hierarchical representations: an extension of the indexed closed weak hierarchies, Discrete applied mathematics, 127, 2, (2003), 199-220.
[6]
Gregory, S. 2008. A fast algorithm to find overlapping communities in networks, Machine learning and knowledge discovery in databases, (2008), 408-423.
[7]
Fellows, M. R., Guo, J., Komusiewicz, C., Niedermeier, R., and Uhlmann, J. 2011. Graph-based data clustering with overlaps, Discrete Optimization, 8, 1, (2011), 2-17.
[8]
Pérez-Suárez, A., Martínez -Trinidad, J. F., Carrasco-Ochoa, J. A., and Medina-Pagola, J. E. 2013. Oclustr: A new graph-based algorithm for overlapping clustering, Neurocomputing, 121, (2013), 234-247.
[9]
Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., and Mooney, R. J. 2005. Model-based overlapping clustering, in: 11th ACM SIGKDD international conference on Knowledge discovery in data mining, ACM, Chicago, 532-537.
[10]
Heller, K. A., and Ghahramani, Z. 2007. A nonparametric bayesian approach to modeling overlapping clusters, In: 11th International conference on Artificial Intelligence and Statistics (AISTATS), San Juan, 187-194.
[11]
Fu, Q., and Banerjee, A. 2008. Multiplicative mixture models for overlapping clustering, In: 8th IEEE International Conference on Data Mining (ICDM2008), IEEE, Pisa, 791-796.
[12]
Baadel, S., Thabtah, F., and Lu, J. 2015. Mcoke: Multi-cluster overlapping k-means extension algorithm, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 9, 2, (2015), 427-430.
[13]
Bonchi, F., Gionis, A., and Ukkonen, A. 2013. Overlapping correlation clustering, Knowledge and information systems, 35, 1, (2013), 1-32.
[14]
Lewis, D.D., 1997. Reuters-21578 dataset, URL= http://www.daviddlewis.com/. (accessed 20 July 2017).
[15]
Weiss, S.M., Indurkhya, N., and Zhang, T. 2010. Fundamentals of predictive text mining, Springer, London.
[16]
Khazaei, A., Ghasemzadeh, M., and Derhami, V. 2015. An automatic method for CVSS score prediction using vulnerabilities description, Journal of Intelligent & Fuzzy Systems (JIFS), 30, 1, (2015), 89-96.
[17]
Kohonen, T. 1998. The self-organizing maps, Neurocomputing, 21, 1, (1998), 1-6.
[18]
Wurst, M., Word Vector Tool (WVT), URL= http://wvtool.sf.net (accessed 20 July 2017).

Cited By

View all
  • (2021)FOCT: Fast Overlapping Clustering for Textual DataIEEE Access10.1109/ACCESS.2021.31300949(157670-157680)Online publication date: 2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICSCA '18: Proceedings of the 2018 7th International Conference on Software and Computer Applications
February 2018
349 pages
ISBN:9781450354141
DOI:10.1145/3185089
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • University of Tokyo

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OKM
  2. OSOM
  3. Overlapping clustering algorithm
  4. Textual data

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSCA 2018

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)FOCT: Fast Overlapping Clustering for Textual DataIEEE Access10.1109/ACCESS.2021.31300949(157670-157680)Online publication date: 2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media