High-dimensional clustering: a clique-based hypergraph partitioning framework

Hu, Tianming; Liu, Chuanren; Tang, Yong; Sun, Jing; Xiong, Hui; Sung, Sam Yuan

doi:10.1007/s10115-012-0609-3

High-dimensional clustering: a clique-based hypergraph partitioning framework

Regular Paper
Published: 09 January 2013

Volume 39, pages 61–88, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Tianming Hu¹,
Chuanren Liu²,
Yong Tang³,
Jing Sun⁴,
Hui Xiong² &
…
Sam Yuan Sung⁵

790 Accesses
10 Citations
Explore all metrics

Abstract

Hypergraph partitioning has been considered as a promising method to address the challenges of high-dimensional clustering. With objects modeled as vertices and the relationship among objects captured by the hyperedges, the goal of graph partitioning is to minimize the edge cut. Therefore, the definition of hyperedges is vital to the clustering performance. While several definitions of hyperedges have been proposed, a systematic understanding of desired characteristics of hyperedges is still missing. To that end, in this paper, we first provide a unified clique perspective of the definition of hyperedges, which serves as a guide to define hyperedges. With this perspective, based on the concepts of shared (reverse) nearest neighbors, we propose two new types of clique hyperedges and analyze their properties regarding purity and size issues. Finally, we present an extensive evaluation using real-world document datasets. The experimental results show that, with shared (reverse) nearest neighbor-based hyperedges, the clustering performance can be improved significantly in terms of various external validation measures without the need for fine tuning of parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-stage Hierarchical Clustering Method Based on Hypergraph

Hypergraph clustering by iteratively reweighted modularity maximization

Article Open access 20 August 2020

LP-Based Pivoting Algorithm for Higher-Order Correlation Clustering

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD conference on management of data, pp 207–216
Arya S, Mount DM, Netanyahu NS, Silverman R, Wu AY (1998) An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J ACM 45(6):891–923
Article MATH MathSciNet Google Scholar
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Reading
Google Scholar
Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on k-means algorithm for optimal clustering in \(R^N\). Inf Sci 146(1–4):221–237
Article MATH MathSciNet Google Scholar
Cheeseman P, Stutz J (1996) Bayesian classification (AutoClass): theory and results. In: Advances in knowledge discovery and data mining, pp 153–180
Chen C, Tseng F, Liang T (2011) An integration of fuzzy association rules and wordnet for document clustering. Knowl Inf Syst 28(3):687–708
Article Google Scholar
Ertoz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM international conference on data mining, pp 47–58
Fodeh S, Punch B, Tan P (2011) On ontology-driven document clustering using core semantic features. Knowl Inf Syst 28(2):395–421
Article Google Scholar
France SL, Carroll JD, Xiong H (2012) Distance metrics for high dimensional nearest neighborhood recovery: compression and normalization. Inf Sci 184(1):92–110
Article MathSciNet Google Scholar
Han E-H, Karypis G, Kumar V, Mobasher B (1998) Hypergraph based clustering in high-dimensional data sets: a summary of results. IEEE Data Eng Bull 21(1):15–22
Google Scholar
Hu T, Sung SY (2006) Finding centroid clusterings with entropy-based criteria. Knowl Inf Syst 10(4):505–514
Article Google Scholar
Hu T, Sung SY, Xiong H, Fu Q (2008) Discovery of maximum length frequent itemsets. Inf Sci 178(1):69–87
Article MathSciNet Google Scholar
Hu T, Tan CL, Tang Y, Sung SY, Xiong H, Qu C (2008) Co-clustering bipartite with pattern preservation for topic extraction. Int J Artif Intell Tools 17(1):87–107
Article Google Scholar
Huang Y, Xiong H, Wu W, Deng P, Zhang Z (2007) Mining maximal hyperclique pattern: a hybrid search strategy. Inf Sci 177(3):703–721
Article MATH MathSciNet Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surveys 31(3):264–323
Article Google Scholar
Kalogeratos A, Likas A (2012) Text document clustering using global term context vectors. Knowl Inf Syst 31(3):455–474
Article Google Scholar
Karypis G (2003) CLUTO—software for clustering high-dimensional datasets. http://glaros.dtc.umn.edu/gkhome/views/cluto
Karypis G, Aggarwal R, Kumar V, Shekhar S (1997) Multilevel hypergraph partitioning: applications in VLSI domain. In: Proceedings of the 34th conference on design automation, pp 526–529
Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 201–212
Leung C, Chan S, Chung F (2006) A collaborative filtering framework based on fuzzy association rules and multiple-level similarity. Knowl Inf Syst 10(3):357–381
Article Google Scholar
Lin TY, Chiang I-J (2005) A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering. Int J Approx Reason 40(1–2):55–80
Article MATH MathSciNet Google Scholar
Liu C, Hu T, Ge Y, Xiong H (2012) Which distance metric is right: An evolutionary k-means view. In: Proceedings of the 12th SIAM international conference on data mining, pp 907–918
Ni X, Quan X, Lu Z, Liu W, Hua B (2011) Short text clustering by finding core terms. Knowl Inf Syst 27(3):345–365
Article Google Scholar
Ozdal MM, Aykanat C (2004) Hypergraph models and algorithms for data-pattern-based clustering. Data Min Knowl Discov 9(1):29–57
Article MathSciNet Google Scholar
Rajpathak D, Chougule R, Bandyopadhyay P (2012) A domain-specific decision support system for knowledge discovery using association and text mining. Knowl Inf Syst 31(3):405–432
Article Google Scholar
Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the 20th international conference on machine learning, pp 616–623
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: KDD workshop on text mining
Vadapalli S, Valluri SR, Karlapalem K (2006) A simple yet effective data clustering algorithm. In: Proceedings of the 6th IEEE international conference on data mining, pp 1108–1112
Xia C, Hsu W, Lee ML, Ooi BC (2006) BORDER: Efficient computation of boundary points. IEEE Trans Knowl Data Eng 18(3):289–303
Article Google Scholar
Xiong H, Tan P-N, Kumar V (2006) Hyperclique pattern discovery. Data Min Knowl Discov 13(2):219–242
Article MathSciNet Google Scholar
Zhao Y, Karypis G (2004) Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach Learn 55(3):311–331
Article MATH Google Scholar
Zhao Y, Karypis G (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10(2):141–168
Article MathSciNet Google Scholar

Download references

Acknowledgments

We would like to thank the editor and reviewers for their valuable comments. This work was supported by NSFC(61100136,61272067,70890082,71028002), GDNSF(S2012030006242) and NSF(CCF-1018151).

Author information

Authors and Affiliations

Dongguan University of Technology, Dongguan, China
Tianming Hu
Department of Management Science and Information Systems, Rutgers, The State University of New Jersey, Newark, NJ, USA
Chuanren Liu & Hui Xiong
South China Normal University, Guangzhou, China
Yong Tang
University of Auckland, Auckland, New Zealand
Jing Sun
South Texas College, McAllen, TX, USA
Sam Yuan Sung

Authors

Tianming Hu
View author publications
You can also search for this author in PubMed Google Scholar
Chuanren Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hui Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Sam Yuan Sung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuanren Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, T., Liu, C., Tang, Y. et al. High-dimensional clustering: a clique-based hypergraph partitioning framework. Knowl Inf Syst 39, 61–88 (2014). https://doi.org/10.1007/s10115-012-0609-3

Download citation

Received: 28 February 2012
Revised: 11 August 2012
Accepted: 28 December 2012
Published: 09 January 2013
Issue Date: April 2014
DOI: https://doi.org/10.1007/s10115-012-0609-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional clustering: a clique-based hypergraph partitioning framework

Abstract

Access this article

Similar content being viewed by others

Multi-stage Hierarchical Clustering Method Based on Hypergraph

Hypergraph clustering by iteratively reweighted modularity maximization

LP-Based Pivoting Algorithm for Higher-Order Correlation Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-dimensional clustering: a clique-based hypergraph partitioning framework

Abstract

Access this article

Similar content being viewed by others

Multi-stage Hierarchical Clustering Method Based on Hypergraph

Hypergraph clustering by iteratively reweighted modularity maximization

LP-Based Pivoting Algorithm for Higher-Order Correlation Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation