Skip to main content
Log in

Geometric double-entity model for recognizing far-near relations of clusters

  • Research Papers
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

When solving many practical problems, we not only need sample labels given by a clustering algorithm, but also rely on the recognition of far-near relations of clusters. Under the difficult condition of many clusters in a high-dimensional data set, the clustering visualization methods based on dimension reductions usually produce the phenomena, e.g., some clusters are overlapping, interlacing, or pushed away; as a result, the far-near relations of some clusters are displayed wrongly or cannot be distinguished. The existing inter-cluster distance methods cannot determine whether two clusters are far away or near. The geometric double-entity model method (GDEM) is proposed to describe far-near relations of clusters, and the methods such as the relative border distance, absolute border distance and region dense degree are designed to measure far-near degrees between clusters. GDEM pays attention to both the absolute distance between nearest sample sets and the dense degrees of border regions of two clusters, and it is able to uncover accurately far-near relations of clusters in a high-dimensional space, especially under the difficult condition mentioned above. The experimental results on four real data sets show that the proposed method can effectively recognize far-near relations of clusters, while the conventional methods cannot.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Xu R, Wunsch II D C. Survey of clustering algorithms. IEEE Trans Neural Netw, 2005, 16: 645–678

    Article  Google Scholar 

  2. Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315: 972–976

    Article  MathSciNet  Google Scholar 

  3. Armstrong S A, Staunton J E, Silverman L B, et al. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet, 2002, 30: 41–47

    Article  Google Scholar 

  4. Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. San Francisco: Morgan Kaufmann Publishers, 2006. 300–301

    Google Scholar 

  5. Bolshakova N, Azuaje F. Cluster validation techniques for genome expression data. Signal Process, 2003, 83: 825–833

    Article  MATH  Google Scholar 

  6. Wua K P, Wang S D. Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recogn, 2009, 42: 710–717

    Article  Google Scholar 

  7. Yin F, Liu C L. Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recogn, 2009, 42: 3146–3157

    Article  MATH  Google Scholar 

  8. Shamir R, Maron-Katz A, Tanay A, et al. EXPANDER-an integrative program suite for microarray data analysis. BMC Bioinformatics, 2005, 6: 232

    Article  Google Scholar 

  9. Ren Y G. Study on data visualization methods and related techniques for clustering (in Chinese). Dissertation for Ph.D. Degree. Shenyang: Northeastern University, 2006

    Google Scholar 

  10. Zhan D C, Zhou Z H. Ensemble-based manifold learning for visualization (in Chinese). J Comput Res Develop, 2005, 42: 1533–1537

    Article  Google Scholar 

  11. Sun M M. Study on theories and algorithms in manifold learning (in Chinese). Dissertation for Ph.D. Degree. Nanjing: Nanjing University of Science and Technology, 2007

    Google Scholar 

  12. Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290: 2323–2326

    Article  Google Scholar 

  13. Weinberger K Q, Sha F, Saul L K. Learning a kernel matrix for nonlinear dimensionality reduction. In: Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004. 839–846

  14. van der Maaten L J P, Hinton G E. Visualizing high-dimensional data using t-SNE. J Mach Learn Res, 2008, 9: 2579–2605

    Google Scholar 

  15. Suykens J A K. Data visualization and dimensionality reduction using kernel maps with a reference point. IEEE Trans Neural Netw, 2008, 19: 1501–1517

    Article  Google Scholar 

  16. Bishop C, Svensen M, Williams C. GTM: the generative topographic mapping. Neural Comput, 1998, 10: 215–234

    Article  Google Scholar 

  17. Tino P, Nabney I. Hierarchical GTM: constructing localized nonlinear projection manifolds in a principled way. IEEE Trans Pattern Anal Mach Intell, 2002, 24: 639–656

    Article  Google Scholar 

  18. Yin H. ViSOM-a novel method for multivariate data projection and structure visualisation. IEEE Trans Neural Netw, 2002, 13: 237–243

    Article  Google Scholar 

  19. Wu S, Chow T. PRSOM: A new visualization method by hybridizing multidimensional scaling and self-organizing map. IEEE Trans Neural Netw, 2005, 16: 1362–1380

    Article  Google Scholar 

  20. Wei H L, Billings S A. Feature subset selection and ranking for data dimensionality reduction. IEEE Trans Pattern Anal Mach Intell, 2007, 29: 162–166

    Article  Google Scholar 

  21. Datta S, Datta S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 2003, 19: 459–466

    Article  Google Scholar 

  22. http://www.mathworks.com/matlabcentral/fileexchange/authors/24811

  23. Radovanovic M, Nanopoulos A, Ivanovic M. Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In: Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, Quebec, Canada, 2009. 865–872

  24. Gong G L. Probability theory and statistics (in Chinese). Beijing: Tsinghua University Press, 2006

    Google Scholar 

  25. Abdi H, Molin P. Lilliefors test of normality. In: Salkind N J, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks: Sage Publications, Inc., 2007

    Google Scholar 

  26. Walpole R E, Myers R H, Myers S L, et al. Probability and Statistics for Engineers and Scientists. 8th ed. Upper Saddle River: Pearson Education, Inc., 2006

    Google Scholar 

  27. Black K. Business Statistics: Contemporary Decision Making. 6th ed. Hoboken: John Wiley & Sons, Inc., 2010

    Google Scholar 

  28. Conover W J. Practical Nonparametric Statistics (in Chinese). 3rd ed. Beijing: Posts & Telecom Press, 2006

    Google Scholar 

  29. Wang K, Zhang J, Li D, et al. Adaptive affinity propagation clustering (in Chinese). Acta Automat Sin, 2007, 33: 1242–1246

    MathSciNet  MATH  Google Scholar 

  30. Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286: 531–537

    Article  Google Scholar 

  31. Hartuv E, Schmitt A, Lange J, et al. An algorithm for clustering cDNAs for gene expression analysis. Genomics, 2000, 66: 249–256

    Article  Google Scholar 

  32. Dembélé D, Kastner P. Fuzzy C-means method for clustering microarray data. Bioinformatics 2003, 19: 973–980

    Article  Google Scholar 

  33. Nene S A, Nayar S K, Murase H. Columbia Object Image Library (COIL-20). Technical Report CUCS-005-96. Columbia University, 1996

  34. Clarke R, Ressom H, Wang A, et al. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev Cancer, 2008, 8: 37–49

    Article  Google Scholar 

  35. Verleysen M, Franois D. The curse of dimensionality in data mining and time series prediction. In: Cabestany J, Prieto A, Sandoval D F, eds. Computational Intelligence and Bioinspired Systems. Berlin: Springer, 2005. 758–770

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to KaiJun Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, K., Yan, X. & Chen, L. Geometric double-entity model for recognizing far-near relations of clusters. Sci. China Inf. Sci. 54, 2040–2050 (2011). https://doi.org/10.1007/s11432-011-4386-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4386-5

Keywords

Navigation