Geometric double-entity model for recognizing far-near relations of clusters

Wang, KaiJun; Yan, XuanHui; Chen, LiFei

doi:10.1007/s11432-011-4386-5

Geometric double-entity model for recognizing far-near relations of clusters

Research Papers
Published: 15 September 2011

Volume 54, pages 2040–2050, (2011)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

KaiJun Wang¹,
XuanHui Yan¹ &
LiFei Chen¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

When solving many practical problems, we not only need sample labels given by a clustering algorithm, but also rely on the recognition of far-near relations of clusters. Under the difficult condition of many clusters in a high-dimensional data set, the clustering visualization methods based on dimension reductions usually produce the phenomena, e.g., some clusters are overlapping, interlacing, or pushed away; as a result, the far-near relations of some clusters are displayed wrongly or cannot be distinguished. The existing inter-cluster distance methods cannot determine whether two clusters are far away or near. The geometric double-entity model method (GDEM) is proposed to describe far-near relations of clusters, and the methods such as the relative border distance, absolute border distance and region dense degree are designed to measure far-near degrees between clusters. GDEM pays attention to both the absolute distance between nearest sample sets and the dense degrees of border regions of two clusters, and it is able to uncover accurately far-near relations of clusters in a high-dimensional space, especially under the difficult condition mentioned above. The experimental results on four real data sets show that the proposed method can effectively recognize far-near relations of clusters, while the conventional methods cannot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Xu R, Wunsch II D C. Survey of clustering algorithms. IEEE Trans Neural Netw, 2005, 16: 645–678
Article Google Scholar
Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315: 972–976
Article MathSciNet Google Scholar
Armstrong S A, Staunton J E, Silverman L B, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet, 2002, 30: 41–47
Article Google Scholar
Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. San Francisco: Morgan Kaufmann Publishers, 2006. 300–301
Google Scholar
Bolshakova N, Azuaje F. Cluster validation techniques for genome expression data. Signal Process, 2003, 83: 825–833
Article MATH Google Scholar
Wua K P, Wang S D. Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recogn, 2009, 42: 710–717
Article Google Scholar
Yin F, Liu C L. Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recogn, 2009, 42: 3146–3157
Article MATH Google Scholar
Shamir R, Maron-Katz A, Tanay A, et al. EXPANDER-an integrative program suite for microarray data analysis. BMC Bioinformatics, 2005, 6: 232
Article Google Scholar
Ren Y G. Study on data visualization methods and related techniques for clustering (in Chinese). Dissertation for Ph.D. Degree. Shenyang: Northeastern University, 2006
Google Scholar
Zhan D C, Zhou Z H. Ensemble-based manifold learning for visualization (in Chinese). J Comput Res Develop, 2005, 42: 1533–1537
Article Google Scholar
Sun M M. Study on theories and algorithms in manifold learning (in Chinese). Dissertation for Ph.D. Degree. Nanjing: Nanjing University of Science and Technology, 2007
Google Scholar
Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290: 2323–2326
Article Google Scholar
Weinberger K Q, Sha F, Saul L K. Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004. 839–846
van der Maaten L J P, Hinton G E. Visualizing high-dimensional data using t-SNE. J Mach Learn Res, 2008, 9: 2579–2605
Google Scholar
Suykens J A K. Data visualization and dimensionality reduction using kernel maps with a reference point. IEEE Trans Neural Netw, 2008, 19: 1501–1517
Article Google Scholar
Bishop C, Svensen M, Williams C. GTM: the generative topographic mapping. Neural Comput, 1998, 10: 215–234
Article Google Scholar
Tino P, Nabney I. Hierarchical GTM: constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 639–656
Article Google Scholar
Yin H. ViSOM-a novel method for multivariate data projection and structure visualisation. IEEE Trans Neural Netw, 2002, 13: 237–243
Article Google Scholar
Wu S, Chow T. PRSOM: A new visualization method by hybridizing multidimensional scaling and self-organizing map. IEEE Trans Neural Netw, 2005, 16: 1362–1380
Article Google Scholar
Wei H L, Billings S A. Feature subset selection and ranking for data dimensionality reduction. IEEE Trans Pattern Anal Mach Intell, 2007, 29: 162–166
Article Google Scholar
Datta S, Datta S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 2003, 19: 459–466
Article Google Scholar
http://www.mathworks.com/matlabcentral/fileexchange/authors/24811
Radovanovic M, Nanopoulos A, Ivanovic M. Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In: Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, Quebec, Canada, 2009. 865–872
Gong G L. Probability theory and statistics (in Chinese). Beijing: Tsinghua University Press, 2006
Google Scholar
Abdi H, Molin P. Lilliefors test of normality. In: Salkind N J, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks: Sage Publications, Inc., 2007
Google Scholar
Walpole R E, Myers R H, Myers S L, et al. Probability and Statistics for Engineers and Scientists. 8th ed. Upper Saddle River: Pearson Education, Inc., 2006
Google Scholar
Black K. Business Statistics: Contemporary Decision Making. 6th ed. Hoboken: John Wiley & Sons, Inc., 2010
Google Scholar
Conover W J. Practical Nonparametric Statistics (in Chinese). 3rd ed. Beijing: Posts & Telecom Press, 2006
Google Scholar
Wang K, Zhang J, Li D, et al. Adaptive affinity propagation clustering (in Chinese). Acta Automat Sin, 2007, 33: 1242–1246
MathSciNet MATH Google Scholar
Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286: 531–537
Article Google Scholar
Hartuv E, Schmitt A, Lange J, et al. An algorithm for clustering cDNAs for gene expression analysis. Genomics, 2000, 66: 249–256
Article Google Scholar
Dembélé D, Kastner P. Fuzzy C-means method for clustering microarray data. Bioinformatics 2003, 19: 973–980
Article Google Scholar
Nene S A, Nayar S K, Murase H. Columbia Object Image Library (COIL-20). Technical Report CUCS-005-96. Columbia University, 1996
Clarke R, Ressom H, Wang A, et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer, 2008, 8: 37–49
Article Google Scholar
Verleysen M, Franois D. The curse of dimensionality in data mining and time series prediction. In: Cabestany J, Prieto A, Sandoval D F, eds. Computational Intelligence and Bioinspired Systems. Berlin: Springer, 2005. 758–770
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Fujian Normal University, Fuzhou, 350108, China
KaiJun Wang, XuanHui Yan & LiFei Chen

Authors

KaiJun Wang
View author publications
You can also search for this author in PubMed Google Scholar
XuanHui Yan
View author publications
You can also search for this author in PubMed Google Scholar
LiFei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to KaiJun Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K., Yan, X. & Chen, L. Geometric double-entity model for recognizing far-near relations of clusters. Sci. China Inf. Sci. 54, 2040–2050 (2011). https://doi.org/10.1007/s11432-011-4386-5

Download citation

Received: 30 July 2010
Accepted: 08 March 2011
Published: 15 September 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s11432-011-4386-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometric double-entity model for recognizing far-near relations of clusters

Abstract

Access this article

Similar content being viewed by others

Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph

VDENCLUE: An Enhanced Variant of DENCLUE Algorithm

Semi-supervised DenPeak Clustering with Pairwise Constraints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geometric double-entity model for recognizing far-near relations of clusters

Abstract

Access this article

Similar content being viewed by others

Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph

VDENCLUE: An Enhanced Variant of DENCLUE Algorithm

Semi-supervised DenPeak Clustering with Pairwise Constraints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation