research-article

Finding disjoint clusters in a categorical data space

Authors:
Mohamed Azmi

AMIPS Research Team, EMI, Mohammed V University of Rabat, Rabat, Morocco

AMIPS Research Team, EMI, Mohammed V University of Rabat, Rabat, Morocco
View Profile

,
Abdelaziz Berrado

AMIPS Research Team, EMI, Mohammed V University of Rabat, Rabat, Morocco

AMIPS Research Team, EMI, Mohammed V University of Rabat, Rabat, Morocco
View Profile

SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and ApplicationsOctober 2018Article No.: 43Pages 1–5https://doi.org/10.1145/3289402.3289543

Published:24 October 2018Publication History

SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications

Pages 1–5

ABSTRACT

In This paper we provide a prototype of method for segment a high dimensional categorical data using frequent patterns. The frequent patterns are mined using a conventional frequent pattern mining algorithm according to a predefined support threshold. In addition, we restrict the frequent patterns length to a predefined low value in order to ensure the understandability of the results. Associations between the frequent patterns are discovered in order to reveal containment and overlap between them. Segments are iteratively defined as the largest region of data space covered by several frequent patterns. The illustrative example shows promising results in term of the quality of the resulted segments and the understandability.

References

Charu C Aggarwal, Alexander Hinneburg, and Daniel A Keim. 2001. On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory. Springer, 420--434. Google ScholarDigital Library
Charu C Aggarwal, Na Ta, Jianyong Wang, Jianhua Feng, and Mohammed Zaki. 2007. Xproj: a framework for projected structural clustering of xml documents. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 46--55. Google ScholarDigital Library
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. 1998. Automatic subspace clustering of high dimensional data for data mining applications. Vol. 27. ACM.Google Scholar
Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A Inkeri Verkamo, et al. 1996. Fast discovery of association rules. Advances in knowledge discovery and data mining 12, 1 (1996), 307--328. Google ScholarDigital Library
Ira Assent. 2012. Clustering high dimensional data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, 4 (2012), 340--350. Google ScholarDigital Library
Ira Assent, Ralph Krieger, Emmanuel Müller, and Thomas Seidl. 2008. INSCY: Indexing subspace clusters with in-process-removal of redundancy. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 719--724. Google ScholarDigital Library
Christian Baumgartner, Claudia Plant, K Railing, H-P Kriegel, and Peer Kroger. 2004. Subspace selection for clustering high-dimensional data. In Data Mining, 2004. ICDM'04. Fourth IEEE International Conference on. IEEE, 11--18. Google ScholarDigital Library
Kristin P Bennett, Usama Fayyad, and Dan Geiger. 1999. Density-based indexing for approximate nearest-neighbor queries. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 233--243. Google ScholarDigital Library
Abdelaziz Berrado and George C Runger. 2007. Using metarules to organize and group discovered association rules. Data mining and knowledge discovery 14, 3 (2007), 409--431. Google ScholarDigital Library
Catherine L Blake and Christopher J Merz. 1998. UCI Repository of machine learning databases {http://www.ics.uci.edu/'mlearn/MLRepository.html}. Irvine, CA: University of California. Department of Information and Computer Science 55 (1998).Google Scholar
Allison Chang, Dimitris Bertsimas, and Cynthia Rudin. 2012. An integer optimization approach to associative classification. In Advances in neural information processing systems. 269--277.Google Scholar
Chun-Hung Cheng, Ada Waichee Fu, and Yi Zhang. 1999. Entropy-based subspace clustering for mining numerical data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 84--93. Google ScholarDigital Library
Guojun Gan and Jianhong Wu. 2004. Subspace clustering for high dimensional categorical data. ACM SIGKDD Explorations Newsletter 6, 2 (2004), 87--94. Google ScholarDigital Library
Venkatesh Ganti, Johannes Gehrke, and Raghu Ramakrishnan. 1999. CAC-TUSâĂnŤclustering categorical data using summaries. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 73--83. Google ScholarDigital Library
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM sigmod record, Vol. 29. ACM, 1--12. Google ScholarDigital Library
Michael E Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2010. Can shared-neighbor distances defeat the curse of dimensionality?. In International Conference on Scientific and Statistical Database Management. Springer, 482--500. Google ScholarDigital Library
Zhexue Huang. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery 2, 3 (1998), 283--304. Google ScholarDigital Library
Karin Kailing, Hans-Peter Kriegel, and Peer Kröger. 2004. Density-connected subspace clustering for high-dimensional data. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 246--256.Google ScholarCross Ref
Hans-Peter Kriegel, Peer Kröger, and Arthur Zimek. 2009. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data (TKDD) 3, 1 (2009), 1. Google ScholarDigital Library
Guimei Liu, Jinyan Li, Kelvin Sim, and Limsoon Wong. 2007. Distance based subspace clustering with flexible dimension partitioning. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 1250--1254.Google ScholarCross Ref
James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281--297.Google Scholar
Emmanuel Müller, Ira Assent, Stephan Günnemann, Ralph Krieger, and Thomas Seidl. 2009. Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In Data Mining, 2009. ICDM'09. Ninth IEEE International Conference on. IEEE, 377--386. Google ScholarDigital Library
Harsha Nagesh, Sanjay Goil, and Alok Choudhary. 2001. Adaptive grids for clustering massive data sets. In Proceedings of the 2001 SIAM International Conference on Data Mining. SIAM, 1--17.Google ScholarCross Ref
Jian Pei, Xiaoling Zhang, Moonjung Cho, Haixun Wang, and Philip S Yu. 2003. Maple: A fast algorithm for maximal pattern-based clustering. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. IEEE, 259--266. Google ScholarDigital Library
Michel Verleysen and Damien François. 2005. The curse of dimensionality in data mining and time series prediction. In International Work-Conference on Artificial Neural Networks. Springer, 758--770. Google ScholarDigital Library
Mohammed J Zaki, Markus Peters, Ira Assent, and Thomas Seidl. 2007. Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data & Knowledge Engineering 60, 1 (2007), 51--70. Google ScholarDigital Library
Arthur Zimek and Jilles Vreeken. 2015. The blind men and the elephant: On meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Machine Learning 98, 1--2 (2015), 121--155. Google ScholarDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications
October 2018
301 pages
ISBN:9781450364621
DOI:10.1145/3289402
Conference Chairs:
Abdelaziz Berrado,
Zohra Bakkoury,
Program Chairs:
Bernadette Bouchon-Meunier,
Mohammed Ramdani
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
frequent pattern
segmentation
subspace clustering
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 32
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Finding disjoint clusters in a categorical data space

SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications

ABSTRACT

References

Cited By

Recommendations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Finding disjoint clusters in a categorical data space

SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications

ABSTRACT

References

Cited By

Recommendations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media